Pytorch BCE 损失不会因词义消歧任务而减少

问题描述

我正在执行词义消歧,并为前 30 万个最常见的英语单词创建了自己的词汇表。我的模型非常简单,其中句子中的每个单词(它们各自的索引值)都通过一个嵌入层,该嵌入层嵌入了单词并对结果嵌入进行了平均。然后将平均嵌入发送到一个线性层,如下面的模型所示。

class TestingClassifier(nn.Module):
  def __init__(self,vocabSize,features,embeddingDim):
      super(TestingClassifier,self).__init__()
      self.embeddings = nn.Embedding(vocabSize,embeddingDim)
      self.linear = nn.Linear(features,2)
      self.sigmoid = nn.Sigmoid()

  def forward(self,inputs):
      embeds = self.embeddings(inputs)
      avged = torch.mean(embeds,dim=-1)
      output = self.linear(avged)
      output = self.sigmoid(output)
      return output

我将 bceloss 作为损失函数运行,将 SGD 作为优化器运行。我的问题是,随着训练的进行,我的损失几乎没有减少,几乎就像它以非常高的损失收敛一样。我尝试了不同的学习率(0.0001、0.001、0.01 和 0.1),但我遇到了同样的问题。

我的训练函数如下:

def train_model(model,optimizer,lossFunction,batchSize,epochs,isRnnModel,trainDataLoader,validDataLoader,earlyStop = False,maxPatience = 1
):

  validationAcc = []
  patienceCounter = 0
  stopTraining = False
  model.train()

  # Train network
  for epoch in range(epochs):
    losses = []
    if(stopTraining):
      break

    for inputs,labels in tqdm(trainDataLoader,position=0,leave=True):

      optimizer.zero_grad()

      # Predict and calculate loss
      prediction = model(inputs)
      loss = lossFunction(prediction,labels)
      losses.append(loss)

      # Backward propagation
      loss.backward()

      # Readjust weights
      optimizer.step()

    print(sum(losses) / len(losses))
    curValidAcc = check_accuracy(validDataLoader,model,isRnnModel) # Check accuracy on validation set
    curTrainAcc = check_accuracy(trainDataLoader,isRnnModel)
    print("Epoch",epoch + 1,"Training accuracy",curTrainAcc,"Validation accuracy:",curValidAcc)

    # Control early stopping
    if(earlyStop):
      if(patienceCounter == 0):
        if(len(validationAcc) > 0 and curValidAcc < validationAcc[-1]):
          benchmark = validationAcc[-1]
          patienceCounter += 1
          print("Patience counter",patienceCounter)
      
      elif(patienceCounter == maxPatience):
        print("EARLY STOP. Patience level:",patienceCounter)
        stopTraining = True

      else:
        if(curValidAcc < benchmark):
          patienceCounter += 1
          print("Patience counter",patienceCounter)
        
        else:
          benchmark = curValidAcc
          patienceCounter = 0

      validationAcc.append(curValidAcc)

批量大小为 32(训练集包含 8000 行),词汇量大小为 300k,嵌入维度为 24。我尝试向网络添加更多线性层,但没有任何区别。即使经过多次训练,训练集和验证集的预测准确率仍保持在 50% 左右(这太可怕了)。非常感谢任何帮助!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)