distillbert ktrain '太多值无法解包'

问题描述

我正在尝试在 Colab 中使用 ktrain 运行 distilBert，但我收到“错误太多值无法解压”。我正在尝试执行有毒评论分类，我从 CivilComments 上传了“train.csv”，我可以运行 BERT 但不能运行 distilBert

#prerequisites:
!pip install ktrain
import ktrain
from ktrain import text as txt
DATA_PATH = '/content/train.csv'
NUM_WORDS = 50000 
MAXLEN = 150 
label_columns = ["toxic","severe_toxic","obscene","threat","insult","identity_hate"]

如果我只是用 'bert' 进行预处理，它工作正常，但是我不能使用 distilbert 模型。使用 distilbert 进行预处理时出现错误：

 (x_test,y_test),preproc = txt.texts_from_csv(DATA_PATH,'comment_text',label_columns=label_columns,val_filepath=None,max_features=NUM_WORDS,maxlen=MAXLEN,preprocess_mode='distilbert')

'要解压的值太多，预期为 2'，如果我用 bert 替换 distilbert 它工作正常（下面的代码），但随后我被迫使用 bert 作为模型，使用 bert 进行预处理工作正常：

(x_train,y_train),(x_test,preprocess_mode='bert')

这个没有错误，但后来我不能使用 distilbert，见下文：

示例：model = txt.text_classifier('distilbert',train_data=(x_train,preproc=preproc) 错误信息：if 'bert' is selected model,then preprocess_mode='bert' should be used and vice versa

我想将 (x_test,preprocess_mode='distilbert') 与 distillbert 模型一起使用，如何避免错误“值太多无法解包”

代码所基于的链接：Arun Maiya (2019)。 ktrain：用于帮助训练神经网络的 Keras 轻量级包装器。 https://towardsdatascience.com/ktrain-a-lightweight-wrapper-for-keras-to-help-train-neural-networks-82851ba889c。

解决方法

如 this example notebook 所示，当指定 texts_from_* 作为模型时，TransformerDataset 函数返回 preprocess_mode='distilbert' 对象（不是 NumpyArrays）。所以，你需要做这样的事情：

trn,val,preproc = txt.texts_from_csv(DATA_PATH,'comment_text',label_columns=label_columns,val_filepath=None,max_features=NUM_WORDS,maxlen=MAXLEN,preprocess_mode='distilbert')