Tensorflow / AI Cloud Platform：HyperTune试验未能报告超参数调整指标

问题描述

我正在Google AI平台上使用runPython() API和TensorFlow 2.1来构建DNN回归器。要使用AI Platform Training超参数调整，我遵循了Google's docs。我使用了以下配置参数：

config.yaml：

tf.estimator

并将指标添加到摘要中，我将以下代码用于DNNRegressor：

trainingInput:
    scaleTier: BASIC
    hyperparameters:
        goal: MINIMIZE
        maxTrials: 2
        maxParallelTrials: 2
        hyperparameterMetricTag: rmse
        enableTrialEarlyStopping: True
        params:
        - parameterName: batch_size
          type: DISCRETE
          discreteValues:
          - 100
          - 200
          - 300
        - parameterName: lr
          type: DOUBLE
          minValue: 0.0001
          maxValue: 0.1
          scaleType: UNIT_LOG_SCALE

根据Google的文档，def rmse(labels,predictions): pred_values = predictions['predictions'] rmse = tf.keras.metrics.RootMeanSquaredError(name='root_mean_squared_error') rmse.update_state(labels,pred_values) return {'rmse': rmse} def train_and_evaluate(hparams): ... estimator = tf.estimator.DNNRegressor( model_dir = output_dir,feature_columns = get_cols(),hidden_units = [max(2,int(FIRST_LAYER_SIZE * SCALE_FACTOR ** i)) for i in range(NUM_LAYERS)],optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),config = run_config) estimator = tf.estimator.add_metrics(estimator,rmse)函数使用指定的指标创建一个新的估算器，然后将其用作超参数指标。但是，AI平台培训服务无法识别此指标： Job details on AI Platform

在本地运行代码时，rmse指标确实会在日志中输出。因此，如何使用估算器将指标提供给AI平台上的培训工作？

此外，还可以选择通过add_metric Python软件包报告指标。但是，它需要将度量值作为输入参数之一。 如何从cloudml-hypertune函数中提取指标（因为这是我用来训练/评估估算器的函数）以输入到tf.estimator.train_and_evaluate函数中？

report_hyperparameter_tuning_metric

ETA：Logs show no error。它表示该作业即使失败也会成功完成。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

gcp-ai-platform-training google-ai-platform google-cloud-platform machine-learning tensorflow

Tensorflow / AI Cloud Platform：HyperTune试验未能报告超参数调整指标

问题描述

解决方法

相关问答