用于分割的 Pytorch VNet 最终 softmax 激活层标签的不同通道尺寸如何获得预测输出？

问题描述

我正在尝试构建 V-Net。当我在训练期间将图像传递给分段时，在 softmax 激活后输出有 2 个通道（如所附图像中的架构中所指定），但标签和输入有 1。我如何转换它以便输出是分段图像?训练时我是否只将其中一个通道作为最终输出（例如 output = output[:,:,:]）而另一个通道作为背景？

outputs = network(inputs)

batch_size = 32
outputs.shape: [32,2,64,128,128]
inputs.shape: [32,1,128]
labels.shape: [32,128]

这是我的 Vnet 前传：

def forward(self,x):
    # Initial input transition
    out = self.in_tr(x)

    # Downward transitions
    out,residual_0 = self.down_depth0(out)
    out,residual_1 = self.down_depth1(out)
    out,residual_2 = self.down_depth2(out)
    out,residual_3 = self.down_depth3(out)

    # Bottom layer
    out = self.up_depth4(out)

    # Upward transitions
    out = self.up_depth3(out,residual_3)        
    out = self.up_depth2(out,residual_2)
    out = self.up_depth1(out,residual_1)
    out = self.up_depth0(out,residual_0)

    # Pass to convert to 2 channels
    out = self.final_conv(out)
    
    # return softmax 
    out = F.softmax(out)
    
    return out [batch_size,128]

V Net architecture as described in (https://arxiv.org/pdf/1606.04797.pdf)

解决方法

那篇论文有两个输出，因为他们预测了两个类别：

网络预测由两个与原始输入数据具有相同分辨率的卷组成，通过一个 soft-max 层进行处理，该层输出每个体素属于前景和背景的概率。

因此，这不是自动编码器，您的输入通过模型作为输出传回。他们使用一组标签来区分他们感兴趣的像素（前景）和其他像素（背景）。如果您希望以这种方式使用 V-net，您将需要更改您的数据。

它不会像指定一个通道作为输出那么简单，因为这将是一个分类任务而不是一个回归任务。您将需要带注释的标签才能使用此模型架构。

activation-function deep-learning machine-learning softmax vnet