范数 norm

L1范数:1维上的距离

 
各个元素绝对值的和

|a-b|,两个数相减,再求绝对值,再相加,是常见的距离的计算方式
也叫L1范数

L2范数:多维上的距离

 
各个元素的平方和再开平方

两个向量
A=(a1,a2,...,an)
B=(b1,b2,...,bn)
设xi=ai - bi,那么向量AB的距离可表示为 

L2距离
 
这就是L2范数,也是欧氏距离,表示多维上的距离/差异 

L2范数将距离的概念 扩展/加强 了

向量有多维,就像花有绿肥红瘦一样,不同事物之间的差异是多方向/方面/维度的,

各个维度的差异积累起来的总差异,也是两个事物之间的差异,

这种差异在数学上叫 L2 norm 

L1 Loss

L1 Loss:绝对值损失,再对绝对值求均值

 
nn.L1Loss(size_average=None, reduce=None, reduction: str = 'mean') -> None

Creates a criterion that measures the meanL1 Loss 示例 absolute error (MAE) between each element in

the input :math:`x` and target :math:`y`.

size_average, reduce 这两个参数已废弃,看 reduction: str = 'mean'

L1Loss
L1 Loss 示例

L1 Loss 示例

  
where :math:`N` is the batch size

import torch
from torch import nn 

loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

output
tensor(1.1354, grad_fn=MeanBackward0)

SmoothL1Loss

Smooth L1 Loss为L1 Loss的平滑处理

 
Init signature:
nn.SmoothL1Loss(
    size_average=None,
    reduce=None,
    reduction: str = 'mean',
    beta: float = 1.0,
) -> None
Docstring:     
Creates a criterion that uses a squared term if the absolute
element-wise error falls below beta and an L1 term otherwise.
It is less sensitive to outliers than :class:`torch.nn.MSELoss` and in some cases
prevents exploding gradients (e.g. see the paper `Fast R-CNN`_ by Ross Girshick).


For a batch of size :math:`N`, the unreduced loss can be described as:

L1Loss

 
.. note::
Smooth L1 loss can be seen as exactly :class:`L1Loss`, but with the :math:`|x - y| < beta`
portion replaced with a quadratic function such that its slope is 1 at :math:`|x - y| = beta`.
The quadratic segment smooths the L1 loss near :math:`|x - y| = 0`.

当x与y相异不大时, 使用的是平方,只不过加了个参数,是扩大还是缩小分布的差异,看参数设置, 
相比MESLoss,可以防止爆炸,数据偏差大时,不再使用平方了,而是回到绝对值(即L1Loss),
只不过又减了一个数,并且这个数是正数,意思就是让偏差小一点
不管怎样,这两者都会随着偏差的增大而增大,能够反应分布之间的差异

L1 Loss易受异常点影响,且绝对值的梯度计算在0点容易丢失梯度。
Smooth L1 Loss 在0点附近是强凸,结合了平方损失和绝对值损失的优点。

L2 Loss/MSELoss
MSELoss

 
    $\displaystyle \frac{1}{n}  \sum_{i=1}^{i=n}{ \big(a_i-b_i\big)^2}$

官方示例

 
import torch
from torch import nn 

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

output
tensor(1.0446, grad_fn=MseLossBackward0)

 

    

 

    

 
nn.MSELoss(size_average=None, reduce=None, reduction: str = 'mean') -> None
    
Creates a criterion that measures the mean squared error (squared L2 norm) 
between each element in the input :math:`x` and target :math:`y`.

L2Loss

 
where :math:`N` is the batch size. 
If :attr:`reduction` is not ``'none'`` (default ``'mean'``), then:
L2Loss

Shape

 
- Input: :math:`(*)`, where :math:`*` means any number of dimensions.
- Target: :math:`(*)`, same shape as the input.

MSELoss要求模型输出与标签的shape一致,
并不像交叉熵那样,标签支持索引的格式
也没什么,就是遇到MSE加一个标签与模型输出的shape是否一致的判断就可以了 

 

    

 
loss_fn = nn.MSELoss()

for X,y in trainDataLoader:
    y_out  = transformer(X)
    print(y_out.shape,y.shape)            # torch.Size([64, 100, 1]) torch.Size([64, 100, 1]) 
    loss = loss_fn(y_out,y)
    loss.backward()
    print(round(loss.item(),5))  #5.5041
    break 
    

 
与前面官方的例子不同的是,这里输入的数据没有设置 requires_grad=True 

输入的数据X是常量,模型中的参数由模型内部自动设置 requires_grad=True 
- requires_grad隐藏在台后了,时间长了,会让人认为不需要requires_grad了
- 实际上,真实改变的就是参数,参数的改变有两步:
- 1-求导,2- w -= learning_rate*w.grad作差  
  
另外,这只是个示例,梯度没有下降,即没有加优化器 

 

  

 
在PyTorch中,并没有直接提供名为RMSELoss的API来计算均方根误差(Root Mean Square Error, RMSE)。
不过,RMSE可以通过均方误差(Mean Squared Error, MSE)损失函数计算后开平方得到。
以下是实现RMSE的步骤和代码示例:
  

 
import torch
import torch.nn as nn

# 假设我们有一些预测值和目标值
predictions = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
targets = torch.tensor([1.5, 2.5, 3.5])

# 定义MSE损失函数
mse_loss = nn.MSELoss()

# 计算MSE
mse = mse_loss(predictions, targets)

# 计算RMSE
rmse = torch.sqrt(mse)

print("MSE:", mse.item())    #MSE: 0.25
print("RMSE:", rmse.item())  #RMSE: 0.5
  

 
当损失函数小于1时,RMSE是大于MSE的
  

自定义RMSE损失函数

 
import torch
import torch.nn as nn
import torch.nn.functional as F

# 假设我们有一些预测值和目标值
predictions = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
targets = torch.tensor([1.5, 2.5, 3.5])

class RMSELoss(nn.Module):
    def __init__(self):
        super(RMSELoss, self).__init__()
        self.mse = nn.MSELoss()
        
    def forward(self, predictions, targets):
        mse = self.mse(predictions, targets)
        rmse = torch.sqrt(mse)
        return rmse

# 使用自定义的RMSE损失函数
rmse_loss = RMSELoss()
loss = rmse_loss(predictions, targets)
print("Custom RMSE Loss:", loss.item())  #Custom RMSE Loss: 0.5
  

 

  

 


参考
    L2范数的理解
    什么是范数(norm)?以及L1,L2范数的简单介绍
    拉普拉斯分布