为什么在 Docker 环境中使用 MLFlow 项目时,MLFlow 无法记录指标、工件?

问题描述

在 docker 环境中运行 MLProject 后,我​​试图将指标和工件存储在主机上。我希望当实验成功完成时,mlruns/ 文件夹中的工件、指标文件夹应该有值并显示在 mlflow ui 上,但工件,mlruns/ 文件夹中的度量文件夹是空的。 mlflow ui 也没有反映新的实验。

/home/mlflow_demo/mlflow-demo.py -

import mlflow
from mlflow.tracking import MlflowClient
from random import random
import pickle

client = MlflowClient()
experiment_id = client.create_experiment(name='first experiment')
run = client.create_run(experiment_id=experiment_id)
for i in range(1000):
 client.log_metric(run.info.run_id,"foo",random(),step=i)
with open("test.txt","w") as f:
 f.write("This is an artifact file")
client.log_artifact(run.info.run_id,"test.txt")
client.set_terminated(run.info.run_id)

/home/mlflow_demo/MLProject -

name: test-project
docker_env:
 image: kusur/apex-pytorch-image:latest
entry_points:
 main:
  command: "python mlflow-demo.py"

命令(在 /home/mlflow_demo 中执行): - mlflow run .

运行上述代码后,得到如下日志-

2021/07/06 12:22:28 INFO mlflow.projects.docker: === Building docker image test-project ===
2021/07/06 12:22:28 INFO mlflow.projects.utils: === Created directory /home/mlflow_demo/mlruns/tmpwa8ydc5j for downloading remote URIs passed to arguments of type 'path' ===
2021/07/06 12:22:28 INFO mlflow.projects.backend.local: === Running command 'docker run --rm -v /home/mlflow_demo/mlruns:/mlflow/tmp/mlruns -v /home/mlflow_demo/mlruns/0/0978fdd89ba44bf7b49975ab84838e82/artifacts:/home/mlflow_demo/mlruns/0/0978fdd89ba44bf7b49975ab84838e82/artifacts -e MLFLOW_RUN_ID=0978fdd89ba44bf7b49975ab84838e82 -e MLFLOW_TRACKING_URI=file:///mlflow/tmp/mlruns -e MLFLOW_EXPERIMENT_ID=0 test-project:latest python mlflow-demo.py' in run with ID '0978fdd89ba44bf7b49975ab84838e82' ===

...

2021/07/06 12:22:33 INFO mlflow.projects: === Run (ID '0978fdd89ba44bf7b49975ab84838e82') succeeded ===

仍然文件夹 mlruns/0/0978fdd89ba44bf7b49975ab84838e82/artifacts 和 mlruns/0/0978fdd89ba44bf7b49975ab84838e82/metrics 是空的。

有人可以提供指点吗?如果问题的框架不够好,请告诉我。

解决方法

您已将代码发布为

for i in range(1000):
client.log_metric(run.info.run_id,"foo",random(),step=i)

应该是这样的

for i in range(1000):
    client.log_metric(run.info.run_id,step=i)

完全没有缩进的情况也是如此

with open("test.txt","w") as f:

你能不能用适当的python缩进更新代码并重新运行它