Docker中的MLFlow-无法在SFTP服务器中存储工件atmoz

问题描述

我想使用docker(完全没有像S3或Blob这样的云存储)运行MLflow。因此,我遵循this guide并尝试将工件存储设置为在另一个Docker容器中运行的atmoz sftp服务器。如MLFlow docs中的建议,我尝试使用主机密钥进行身份验证,但是,当我尝试注册我的工件时,收到以下错误pysftp.exceptions.CredentialException: No password or key specified.

我猜,我的主机密钥设置有问题。我也尝试遵循this guide(在this question中提到),但是不幸的是,对于我的有限的容器,sftp服务器和pub-priv-key设置知识,它没有足够的详细信息。我的docker-compose看起来像这样...

services:
db:
    restart: always
    image: mysql/mysql-server:5.7.28
    container_name: mlflow_db
    expose:
        - "3306"
    networks:
        - backend
    environment:
        - MYSQL_DATABASE=${MYSQL_DATABASE}
        - MYSQL_USER=${MYSQL_USER}
        - MYSQL_PASSWORD=${MYSQL_PASSWORD}
        - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
    volumes:
        - dbdata:/var/lib/mysql

mlflow-sftp:
    image: atmoz/sftp
    container_name: mlflow-sftp
    ports:
        - "2222:22"
    volumes:
        - ./storage/sftp:/home/foo/storage
        - ./ssh_host_ed25519_key:/home/foo/.ssh/ssh_host_ed25519_key.pub:ro
        - ./ssh_host_rsa_key:/home/foo/.ssh/ssh_host_rsa_key.pub:ro
    command: foo::1001
    networks:
        - backend
    
web:
    restart: always
    build: ./mlflow
    depends_on:
        - mlflow-sftp
    image: mlflow_server
    container_name: mlflow_server
    expose:
        - "5000"
    networks:
        - frontend
        - backend
    volumes:
        - ./ssh_host_ed25519_key:/root/.ssh/ssh_host_ed25519_key:ro
        - ./ssh_host_rsa_key:/root/.ssh/ssh_host_rsa_key:ro
    command: >
        bash -c "sleep 3
        && ssh-keyscan -H mlflow-sftp >> ~/.ssh/known_hosts
        && mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root sftp://foo@localhost:2222/storage --host 0.0.0.0"
    
nginx:
    restart: always
    build: ./nginx
    image: mlflow_nginx
    container_name: mlflow_nginx
    ports:
        - "80:80"
    networks:
        - frontend
    depends_on:
        - web

网络: 前端: 司机:桥 后端: 司机:桥

卷: dbdata:

...,然后在我的python脚本中创建一个新的mlflow实验,如下所示。

remote_server_uri = "http://localhost:80" 
mlflow.set_tracking_uri(remote_server_uri)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #,artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
EXPERIMENT_NAME = "test43"
mlflow.create_experiment(EXPERIMENT_NAME) #,artifact_location=ARTIFACT_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
with mlflow.start_run():
    print(mlflow.get_artifact_uri())
    print(mlflow.get_registry_uri())
    lr = ElasticNet(alpha=alpha,l1_ratio=l1_ratio,random_state=42)
    lr.fit(train_x,train_y)

    predicted_qualities = lr.predict(test_x)

    (rmse,mae,r2) = eval_metrics(test_y,predicted_qualities)

    print("Elasticnet model (alpha=%f,l1_ratio=%f):" % (alpha,l1_ratio))
    print("  RMSE: %s" % rmse)
    print("  MAE: %s" % mae)
    print("  R2: %s" % r2)

    mlflow.log_param("alpha",alpha)
    mlflow.log_param("l1_ratio",l1_ratio)
    mlflow.log_metric("rmse",rmse)
    mlflow.log_metric("r2",r2)
    mlflow.log_metric("mae",mae)

    tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme

    if tracking_url_type_store != "file":
        mlflow.sklearn.log_model(lr,"model",registered_model_name="ElasticnetWineModel")
    else:
        mlflow.sklearn.log_model(lr,"model")

我尚未修改提到的第一个指南的dockerfile,即您可以看到它们here。我的猜测是我弄乱了主机密钥,也许把它们放在了错误的目录中,但是经过数小时的蛮力试验之后,我希望有人可以帮助我向正确的方向发展。让我知道是否有任何东西可以重现该错误。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...