为什么翻转图像会更改CNN池输出

问题描述

我正在查看图像嵌入，并且想知道为什么翻转图像会改变输出。考虑去掉resnet18的头部，例如：

do {
    let decoder = JSONDecoder()
    decoder.dataDecodingStrategy = .base64

    // You should try to decode `NasaCollection`!!!
    let videos = try decoder.decode(NasaCollection.self,from: data)
    completed(.success(videos))
}

最后一层看起来像这样，最重要的是，在最后一层上，像素/要素合并为1个像素是import torch import torch.nn as nn import torchvision.models as models device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") model = models.resnet18(pretrained=True) model.fc = nn.Identity() model = model.to(device) model.eval() x = torch.randn(20,3,128,128).to(device) with torch.no_grad(): y1 = model(x) y2 = model(x.flip(-1)) y3 = model(x.flip(-2)) ：

根据我的想法，由于在卷积之上只是卷积，因此在池化之前，将发生的一切就是特征图将根据图像的翻转方式翻转。平均池仅对最后一个特征图（沿每个通道）进行平均，并且对其方向不变。 AdaptiveAveragePooling应该是相同的。

“常规”卷积网之间的关键区别在于我们将池平均化为一个像素宽度。

但是，当我查看AdaptiveMaxPool，y1-y2，y1-y3时，这些值明显不同于零。我在想什么错？

解决方法

我认为池输出已更改，因为池层的输入未按预期传递。

简短答案：：输入被翻转，但未翻转Conv2d图层的权重。这些内核权重也需要根据输入的翻转进行翻转，以获得预期的输出。

长答案：在这里，根据模型的尾部，Conv2d的输出将传递到AdaptiveAveragePooling。为了理解，让我们现在暂时忽略BatchNorm。

为简单起见，让我们将输入张量视为x = [1,3,5,4,7]，内核为k =[0.3,0.5,0.8]。当鼠标悬停在输入上时，位置[0,0]的输出将为[0.3 * 1 + 0.5 * 3 + 0.8 * 5] = 6.8 ，[0,2]将为[0.3 * 5 + 0.5 * 4 + 0.8 * 7] = 9.3 考虑了stride=1。

现在，如果将x_flip = [7,1]的输入翻转，则位置[0,0]的输出将为[0.3 * 7 + 0.5 * 4 + 0.8 * 5] = 8.1 ， [0,2]为[0.3 * 5 + 0.5 * 3 + 0.8 * 1] = 3.8 。

由于两种情况下输出的头和尾都不同（ 8.1！= 9.3 和 6.8！= 3.8 ），因此在卷积层之后得到的输出为不同，合并后的最终输出将得到不同/意外的结果。

因此，要在此处获得所需的输出，还需要翻转内核。

conv-neural-network data-augmentation deep-learning image-processing

为什么翻转图像会更改CNN池输出

问题描述

解决方法

相关问答