从tf.data.Dataset训练keras模型时如何指定标签/功能？

问题描述

我遵循了有关如何创建/加载TFRecords here的官方教程。

因此，现在我的数据集包含{"feat1":...,"feat2":...,"feat3":...}形式的示例。

我有一个模型（子类），该模型旨在使用feat1的派生词作为输入，并预测feat2和feat3作为输出。它的.call()方法返回一个字典：{"feat3":logits3,"feat2":logits2}，而我使用SparseCategoricalCrossentropy(from_logits=True)作为损失函数。根据{{3}}，您可以命名输出层以指定不同的输出，这是我在子类化模型中所做的。

现在，我将训练数据作为tf.Dataset加载（如本教程中所述），并希望将其传递给我的Keras模型的.fit()函数。

我当前的设置是这样

prep_model = HuggingFaceModel()
tokenizer = HuggingFacetokenizer()
my_model = MySubclassedModel()

inputs = tf.keras.Input(shape=(None,),name="feat1")
input_dict = tokenizer.encode(inputs) # causes TypeError - works if I pass feat1 manually
x = tf.constant(input_dict)[None,:] # works if I pass feat1 manually
x = prep_model(inputs)
out = my_model(x)

model = tf.keras.Model(inputs=inputs,outputs=[out["feat2"],out["feat3"]])

train_data = read_tf_dataset(path)
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

model.compile(optimizer,loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))
model.fit(train_data,epochs=1,batch_size=8)

但是，当我尝试此操作时，我得到了TypeError，而在input_dict = tokenizer.encode(inputs)上没有任何其他错误消息。由于此精确设置适用于使用HuggingFace模型进行手动推断，因此我认为问题出在.fit()方法或传递数据集的方式上。

我的调试器只告诉我inputs是<tf.Tensor 'feat1:0' shape=(None,None) dtype=float32>，而当我尝试tf.print(inputs)时，我得到：

AttributeError: 'Tensor' object has no attribute '_datatype_enum'

如果有人可以帮助或至少给我提示如何调试它，我将非常感激！我是TF / Keras的新手。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

huggingface-tokenizers keras tensorflow tf.data.dataset