AzureDevOPS ML 错误:我们在:/home/vsts/work/1/s 或其父目录中找不到 config.json

问题描述

我正在尝试创建 Azure DEVOPS ML 管道。以下代码在 Jupyter Notebooks 上 100% 正常工作,但是当我在 Azure Devops 中运行它时,我收到此错误

Traceback (most recent call last):
  File "src/my_custom_package/data.py",line 26,in <module>
    ws = Workspace.from_config()
  File "/opt/hostedtoolcache/Python/3.8.7/x64/lib/python3.8/site-packages/azureml/core/workspace.py",line 258,in from_config
    raise UserErrorException('We Could not find config.json in: {} or in its parent directories. '
azureml.exceptions._azureml_exception.UserErrorException: UserErrorException:
    Message: We Could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories.
    InnerException None
    ErrorResponse 
{
    "error": {
        "code": "UserError","message": "We Could not find config.json in: /home/vsts/work/1/s or in its parent directories. Please provide the full path to the config file or ensure that config.json exists in the parent directories."
    }
}

代码是:

#import
from sklearn.model_selection import train_test_split
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.compute import ComputeTarget,AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.experiment import Experiment
from datetime import date
from azureml.core import Workspace,Dataset



import pandas as pd
import numpy as np
import logging

#getdata
subscription_id = 'mysubid'
resource_group = 'myrg'
workspace_name = 'mlplayground'
workspace = Workspace(subscription_id,resource_group,workspace_name)
dataset = Dataset.get_by_name(workspace,name='correctData')


#auto ml
ws = Workspace.from_config()


automl_settings = {
    "iteration_timeout_minutes": 2880,"experiment_timeout_hours": 48,"enable_early_stopping": True,"primary_metric": 'spearman_correlation',"featurization": 'auto',"verbosity": logging.INFO,"n_cross_validations": 5,"max_concurrent_iterations": 4,"max_cores_per_iteration": -1,}



cpu_cluster_name = "computecluster"
compute_target = ComputeTarget(workspace=ws,name=cpu_cluster_name)
print(compute_target)
automl_config = AutoMLConfig(task='regression',compute_target = compute_target,debug_log='automated_ml_errors.log',training_data = dataset,label_column_name="paidindays",**automl_settings)

today = date.today()
d4 = today.strftime("%b-%d-%Y")

experiment = Experiment(ws,"myexperiment"+d4)
remote_run = experiment.submit(automl_config,show_output = True)

from azureml.widgets import RunDetails
RunDetails(remote_run).show()

remote_run.wait_for_completion()

解决方法

您需要提供 Workspace.from_config() 的配置路径。 在 https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py 下,您可以找到有关如何创建配置文件的以下说明: 创建工作区:

from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',subscription_id='<azure-subscription-id>',resource_group='myresourcegroup',create_resource_group=True,location='eastus2'
           )

保存工作区配置:

ws.write_config(path="./file-path",file_name="config.json")

从默认路径加载配置:

ws = Workspace.from_config()
ws.get_details()

或从指定路径加载配置:

ws = Workspace.from_config(path="my/path/config.json")

可以在此处找到有关如何从_config 创建工作区的更多详细信息: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-

,

您的代码发生了一些奇怪的事情,您从第一个工作区 (workspace = Workspace(subscription_id,resource_group,workspace_name)) 获取数据,然后使用来自第二个工作区 (ws = Workspace.from_config()) 的资源。我建议避免让代码依赖于两个不同的工作区,尤其是当您知道一个底层数据源可以注册(链接)到多个工作区 (documentation) 时。

通常,在实例化 config.json 对象时使用 Workspace 文件将导致交互式身份验证。当您的代码将被处理时,您将收到一个日志,要求您访问特定 URL 并输入代码。这将使用您的 Microsoft 帐户来验证您是否有权访问 Azure 资源(在本例中为您的 Workspace('mysubid','myrg','mlplayground'))。当您开始将代码部署到虚拟机或代理上时,这有其局限性,您不会总是手动检查日志、访问 URL 并进行身份验证。

为此,强烈建议设置更高级的身份验证方法,我个人建议使用服务主体方法,因为如果操作得当,它简单、方便且安全。 您可以按照 Azure 的官方文档 here