如何在pyTorch中获得概率的偏导数输入？

问题描述

我想通过以下步骤生成攻击样本：

找到一个预训练的CNN分类模型，输入为X，输出为P(y|X)，X的最可能结果为y。
我想输入X'得到y_fool，其中X'离X不远，y_fool不等于y
获取 X' 的步骤是：enter image description here
如何得到图像中描述的偏导数？

这是我的代码，但我没有：（模型是 Vgg16）

x = torch.autograd.Variable(image,requires_grad=True)
output = model(image)
prob = nn.functional.softmax(output[0],dim=0)
    
prob.backward(torch.ones(prob.size()))
print(x.grad)

我应该如何修改我的代码？有人可以帮助我吗？我将不胜感激。

解决方法

在这里，重点是通过网络反向传播一个“假”示例，换句话说，您需要最大化输出的一个特定坐标，该坐标与 x 的实际标签不对应。

例如，假设您的模型输出 N 维向量，x 标签应为 [1,...]，我们将尝试使模型实际预测 [0,1,...] （所以 y_fool 实际上将它的第二个坐标设置为 1，而不是第一个）。

附带说明：Variable 已弃用，只需将 requires_grad 标志设置为 True。所以你得到：

x = torch.tensor(image,requires_grad=True)
output = model(x)
# If the model is well trained,prob_vector[1] should be almost 0 at the beginning
prob_vector = nn.functional.softmax(output,dim=0)
# We want to fool the model and maximize this coordinate instead of prob_vector[0]
fool_prob = prob_vector[1]
# fool_prob is a scalar tensor,so we can backward it easy
fool_prob.backward()
# and you should have your gradients : 
print(x.grad)

之后，如果您想在循环中使用 optimizer 来修改 x，请记住 pytorch optimizer.step 方法试图最小化损失，而您想最大化它。因此，要么使用负学习率，要么更改反向传播符号：

# Maximizing a scalar is minimizing its opposite
(-fool_prob).backward()

conv-neural-network gradient gradient-descent pytorch