从import torch开始,torch对整个文件就开始构建一张 计算图 每个tensor都是这张计算图中的一个节点 从执行backword的tensor开始,反向更新tensor的grad属性, 也就是梯度, 梯度是tensor的一个属性
|
问题代码
x = torch.exp(x-x.max())
for i in range(batch_size):
x[i]=x[i]/x[i].sum()
问题在于
x[i]=x[i]/x[i].sum()
在梯度计算中,不允许内部修改,修改x[i],就是修改了x内部的数据
解决办法是使用 detech() 或者 clone(),
修改整个x是可以的,使用新的变量也是可以的,下面就是使用了新变量
下面的代码给为个警告,说torch.tensor([])将作为常量计算
#统一转换到负数(非正),这样exp运算后也不会出现极大的数
x = torch.exp(x-x.max())
out = torch.tensor([])
for i in range(batch_size):
row = (x[i]/x[i].sum()).unsqueeze(dim=0)
out = torch.cat((out, row), dim=0)
最终代码
x = torch.exp(x-x.max())
out = (x[0]/x[0].sum()).unsqueeze(dim=0)
for i in range(1,batch_size):
row = (x[i]/x[i].sum()).unsqueeze(dim=0)
out = torch.cat((out, row), dim=0)
|
|
|
|
|
|
|
标量 浮点 可导
import torch a=torch.tensor(1.0,requires_grad=True) a.ndim 0 a.backward()
tensor设置requires_grad=True才会有grad属性,进而才涉及backward,否则就无从谈起backward
import torch a = torch.tensor([1., 2., 3.], requires_grad=True) print(a.grad) # None out = a.cos() out.sum().backward() #backward方法更新了计算图中所有与out相关的tensor的grad属性 print(a.grad) #tensor([-0.8415, -0.9093, -0.1411])
backward不更新不相关tensor的grad
import torch
a = torch.tensor([1., 2., 3.], requires_grad=True)
print(a.grad) # None
b = torch.tensor([1., 2., 3.], requires_grad=True)
out = a.cos()
out.sum().backward()
print(a.grad) #tensor([-0.8415, -0.9093, -0.1411])
print(b.grad) #None
requires_grad=True在数据上建立了一份视图,不能再用in-place方式修改原数据
import torch
a = torch.tensor([1., 2., 3.], requires_grad=True)
a[2]=1
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_389/2764311836.py in module
2
3 a = torch.tensor([1., 2., 3.], requires_grad=True)
----> 4 a[2]=1
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.
整个模型计算中,只有模型参数有梯度,数据是没有梯度的
import torch
from torch import nn
class Model(torch.nn.Module):
def __init__(self,in_feature=1,out_feature=3):
super().__init__()
self.line_layer = nn.Linear(in_features=in_feature,out_features=out_feature)
def forward(self,x):
x = self.line_layer(x)
return x
model = Model(in_feature=1,out_feature=3)
for param in model.parameters():
print(param)
Parameter containing:
tensor([[-0.3010],
[-0.1164],
[-0.1319]], requires_grad=True)
Parameter containing:
tensor([-0.0333, -0.6954, -0.8287], requires_grad=True)
x = torch.randn(2,1) y_pred = model(x) print(y_pred.shape) #torch.Size([2, 3]) label= torch.tensor([0,0]).unsqueeze(dim=1).long() print(label.shape) #torch.Size([2, 1])
def loss_fn(model_out,label):
_mean = (model_out - label).mean()
return _mean
backward只是计算梯度grad,根据梯度修改模型参数的工作由优化器完成
loss = loss_fn(y_pred,label)
loss.backward()
torch.detach()
返回一个新的tensor,从当前计算图中分离下来的, 但是仍指向原变量的存放位置, 不同之处只是requires_grad为false, 得到的这个tensor永远不需要计算其梯度,不具有grad。
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach()
torch.as_tensor(data=x).float() 替换 torch.tensor(data=x).float()
pytorch:.detach()、.detach_()的作用和区别