总和的梯度等于pytorch中神经网络的梯度总和吗？

问题描述

让我们假设下面有代码，我想计算L的雅可比，这是神经网络在Pytorch中所做的预测，L的大小为nx1，其中n是小批量中的样本数。为了避免L的每个条目（n个条目）的for循环来计算迷你批次中每个样本的雅可比，一些代码我发现只是将神经网络（L）的n个预测相对于输入求和，然后计算总和的梯度。首先，我不明白为什么pytorch体系结构中每个样本的总和的梯度与总和的梯度相同。其次，我尝试使用总和和for循环，结果有所不同。可能是由于数值逼近还是因为总和没有意义？

下面的代码，其中两个函数都属于nn.module：

def forward(self,x):
        with torch.set_grad_enabled(True):
            def function(x,t):
                 self.n = n = x.shape[1]//2

                 qqd = x.requires_grad_(True)
                 L = self._lagrangian(qqd).sum()
                 J = grad(L,qqd,create_graph=True)[0]

        
def _lagrangian(self,qqd):
    x = F.softplus(self.fc1(qqd))
    x = F.softplus(self.fc2(x))
    x = F.softplus(self.fc3(x))
    L = self.fc_last(x)
    return L

解决方法

我认为应该，这只是一个玩具示例

w = torch.tensor([2.],requires_grad=True)
x1 = torch.tensor([3.],requires_grad=True)
x2 = torch.tensor([4.],requires_grad=True)
y = w * a + w * b
y.backward() # calculate gradient

>>> w.grad
tensor([7.])

neural-network python pytorch tensorflow tensorflow tensorflow