使用 TensorFlow Extended (TFX) 进行多输出分类

问题描述

我对 TFX（TensorFlow Extended）非常陌生，并且一直在浏览 TensorFlow 门户上的示例 tutorial 以了解更多信息以将其应用于我的数据集。

在我的场景中，手头的问题不是预测单个标签，而是需要我预测 2 个输出（类别 1、类别 2）。

我已经使用纯 TensorFlow Keras Functional API 完成了这项工作，效果很好，但现在我想看看是否可以将其安装到 TFX 管道中。

我得到错误的地方是在管道的 Trainer 阶段，它抛出错误的地方是在 _input_fn 中，我怀疑这是因为我' m 没有在管道中正确地将给定数据拆分为（特征、标签）张量对。

场景：

输入数据的每一行以 [Col1、Col2、Col3、ClassificationA、ClassificationB]
ClassificationA 和 ClassificationB 是我尝试使用 Keras 功能模型预测的分类标签

keras 函数模型的输出层如下所示，其中有 2 个输出连接到单个密集层（注意：附加到末尾的 _xf 只是为了说明我已将类编码为 int 表示）

output_1 = tf.keras.layers.Dense( TargetA_Class,activation='sigmoid',name = 'ClassificationA_xf')(dense)

output_2 = tf.keras.layers.Dense( TargetB_Class,name = 'ClassificationB_xf')(dense)

model = tf.keras.Model(inputs = 输入，输出 = [输出_1，输出_2])

在教练模块文件中，我在模块文件的开头导入了所需的包>

import tensorflow_transform as tft
from tfx.components.tuner.component import TunerFnResult
import tensorflow as tf
from typing import List,Text
from tfx.components.trainer.executor import TrainerFnArgs
from tfx.components.trainer.fn_args_utils import DataAccessor,FnArgs
from tfx_bsl.tfxio import dataset_options

trainer 模块文件中的当前 input_fn 如下所示（按照教程）

def _input_fn(file_pattern: List[Text],data_accessor: DataAccessor,tf_transform_output: tft.TFTransformOutput,batch_size: int = 200) -> tf.data.Dataset:
  """Helper function that Generates features and label dataset for tuning/training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features,indices) tuple where features is a
      dictionary of Tensors,and indices is a single Tensor of label indices.
      
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,dataset_options.TensorFlowDatasetoptions(
          batch_size=batch_size,#label_key=[_transformed_name(x) for x in _CATEGORICAL_LABEL_KEYS]),label_key=_transformed_name(_CATEGORICAL_LABEL_KEYS[0]),_transformed_name(_CATEGORICAL_LABEL_KEYS[1])),tf_transform_output.transformed_Metadata.schema)

当我运行训练器组件时出现的错误是：

label_key=_transformed_name(_CATEGORICAL_LABEL_KEYS[0]),transformed_name(_CATEGORICAL_LABEL_KEYS1)),

^ 语法错误：位置参数跟随关键字参数

我也试过 label_key=[_transformed_name(x) for x in _CATEGORICAL_LABEL_KEYS]) 也有错误。

但是，如果我只传入一个标签键，label_key=transformed_name(_CATEGORICAL_LABEL_KEYS[0])，那么它工作正常。

仅供参考 - _CATEGORICAL_LABEL_KEYS 只不过是一个列表，其中包含我试图预测的 2 个输出的名称（分类 A、分类 B）。

transformed_name 只不过是一个为转换后的数据返回更新的名称/键的函数：

def transformed_name(key):
  return key + '_xf'

问题：

据我所知，dataset_options.TensorFlowDatasetOptions 的 label_key 参数只能接受单个字符串/标签名称，这意味着它可能无法输出具有多个标签的数据集。

有什么方法可以修改 _input_fn 以便我可以获得 _input_fn 返回的数据集来处理返回 2 个输出标签？所以返回的张量看起来像：

Feature_Tensor: {Col1_xf: Col1_transformedfeature_values,Col2_xf: Col2_transformedfeature_values,Col3_xf: Col3_transformedfeature_values}

Label_Tensor: {ClassificationA_xf: ClassA_encodedlabels,分类B_xf：ClassB_encodedlabels}

希望得到更广泛的 tfx 社区的建议！

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

machine-learning python-3.x tensorflow tensorflow tensorflow tf.keras tfx