PyTorch中Inplace操作的反向传播误差

问题描述

初步

在我的初步模型中，我使用 RNN Cell 来学习用户嵌入。我将时间 t-1 的嵌入作为 RNN 输入，并输出时间t的嵌入。然后我就可以利用时间t的embedding来学习时间t+1的embedding，依此类推。对于每个批次，我将获得一个张量 rnn_output，其中包括唯一用户的嵌入。

此后，我想为我的模型添加一个注意力机制。但我不想限制t。因此，我使用张量列表 rnn_history 来记住每个用户在每个 epoch 开始时的嵌入历史，

rnn_history = [torch.zeros((0,args.embedding_dim)).cuda()] * user_num

并使用这些张量计算每批中的注意力。首先，我将嵌入的最后一个输出添加到 rnn_history 中。然后我用用户的整体历史来计算注意力。最后，我将注意力嵌入放回张量 rnn_output。

for e in range(len(rnn_output)):
    rnn_history[batch_id[e]] = torch.cat([rnn_history[batch_id[e]],rnn_output)
    rnn_output[e,:] = model.attention(rnn_history[batch_id[e]])

这是注意力在我的模型中的工作原理。

def attention(self,embeddings):
    q = torch.mm(embeddings[-1:],self.w_q)
    k = torch.mm(embeddings,self.w_k)
    v = torch.mm(embeddings,self.w_v)
    alpha = torch.softmax(torch.matmul(q,k.T) / self.embedding_dim,dim=1)
    return torch.sum((v.T * alpha).T,dim=0,keepdim=True)

问题

添加注意力机制后出现以下错误。

RunTimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [301,128]],which is output 0 of TBackward,is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that Failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

我认为发生的错误可能是由于这一行。

rnn_output[e,:] = model.attention(rnn_history[batch_id[e]])

这可能是一个就地操作。因此，我重写了我的代码。

attention_output = torch.zeros(0,args.embedding_dim)).cuda()
for e in range(len(rnn_output)):
    rnn_history[batch_id[e]] = torch.cat([rnn_history[batch_id[e]],rnn_output)
    attention_output = torch.cat([attention_output,model.attention(rnn_history[batch_id[e]])])

我使用一个新的张量 attention_output 来捕捉注意力输出。这样我就可以用concatenate操作代替inplace操作了。但是，错误仍然存在。我误解了就地操作吗？我该如何重写我的代码来解决这个问题？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

attention-model python pytorch tensor tensor