我的 Pytorch 模型给出了非常糟糕的结果

问题描述

我是 Pytorch 深度学习的新手。我对 Tensorflow 更有经验，因此我应该说我对深度学习本身并不陌生。

目前，我正在研究一个简单的 ANN 分类。只有 2 个类，所以我很自然地使用 softmax bceloss 组合。

数据集是这样的：

shape of X_train (891,7)
Shape of Y_train (891,)
Shape of x_test (418,7)

我将 X_train 和其他人转换为火炬张量作为 train_data 等等。下一步是：

train_ds = TensorDataset(train_data,train_label)
# Define data loader
batch_size = 32
train_dl = DataLoader(train_ds,batch_size,shuffle=True)

我制作的模型类如下：

class Net(nn.Module):

    def __init__(self):
        super(Net,self).__init__()
   
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(7,32)
        self.bc1 = nn.Batchnorm1d(32)
        self.fc2 = nn.Linear(32,64)
        self.bc2 = nn.Batchnorm1d(64)
        self.fc3 = nn.Linear(64,128)
        self.bc3 = nn.Batchnorm1d(128)
        self.fc4 = nn.Linear(128,32)
        self.bc4 = nn.Batchnorm1d(32)
        self.fc5 = nn.Linear(32,10)
        self.bc5 = nn.Batchnorm1d(10)
        self.fc6 = nn.Linear(10,1)
        self.bc6 = nn.Batchnorm1d(1)
        
        self.drop = nn.Dropout2d(p=0.5)
        
        
    def forward(self,x):
        torch.nn.init.xavier_uniform(self.fc1.weight)
        x = self.fc1(x)
        x = self.bc1(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc2(x)
        x = self.bc2(x)
        x = F.relu(x)
        
        #x = self.drop(x)
        x = self.fc3(x)
        x = self.bc3(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc4(x)
        x = self.bc4(x)
        x = F.relu(x)
        
        #x = self.drop(x)
        x = self.fc5(x)
        x = self.bc5(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc6(x)
        x = self.bc6(x)        
        x = torch.sigmoid(x)
        return x
    
model = Net()

损失函数和优化器定义：

loss = nn.bceloss()
optimizer = torch.optim.Adam(model.parameters(),lr=0.00001,betas=(0.9,0.999),eps=1e-08,weight_decay=0,amsgrad=False)

最后，任务是在 epochs 中运行前向：

num_epochs = 1000
# Repeat for given number of epochs
for epoch in range(num_epochs):
        
    # Train with batches of data
    for xb,yb in train_dl:
        pred = model(xb)
        
        yb = torch.unsqueeze(yb,1)
        
        #print(pred,yb)
        print('grad',model.fc1.weight.grad)
        
        l = loss(pred,yb)
        #print('loss',l)
                    
        # 3. Compute gradients
        l.backward()
            
        # 4. Update parameters using gradients
        optimizer.step()
            
        # 5. Reset the gradients to zero
    optimizer.zero_grad()
    
    # Print the progress
    if (epoch+1) % 10 == 0:
        print('Epoch [{}/{}],Loss: {:.4f}'.format(epoch+1,num_epochs,l.item()))

我可以在输出中看到，在对所有批次进行每次迭代后，在应用此 zero_grad 后，硬权重不为零。

然而，模型很糟糕。我只得到了大约 50% 的 F1 分数！当我调用它来预测 train_dl 本身时，该模型很糟糕！！！

我想知道是什么原因。权重的等级不为零但没有正确更新？优化器没有优化权重？或者还有什么？

有人可以看看吗？

我已经尝试过不同的损失函数和优化器。我尝试使用更小的数据集、更大的批次、不同的超参数。

谢谢！ :)

解决方法

首先，你不使用softmax激活来进行BCE loss，除非你有2个输出节点，但事实并非如此。在 PyTorch 中，与具有内置 softmax 函数的 CCE 不同，BCE 损失在计算损失之前不应用任何激活函数。所以，如果你想使用 BCE，你必须在输出层使用 sigmoid（或任何函数 f: R -> [0,1]），而你没有。

此外，如果您想执行 SGD（这是默认设置），最好为每个批次执行 optimizer.zero_grad()。如果你不这样做，你将只是在做全批量梯度下降，这很慢，很容易陷入局部最小值。

deep-learning pytorch tensorflow tensorflow tensorflow

我的 Pytorch 模型给出了非常糟糕的结果

问题描述

解决方法

相关问答