AzureMl 管道：如何将 step1 的数据访问到 step2

问题描述

我正在关注 microsoft 的这个 article 以通过两个步骤创建 azure ml 管道，并希望使用 step1 写入 step2 的数据。根据下面的文章，代码应该将 step1 写入的数据路径提供到用于 step2 的脚本中作为参数

datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data",destination=(datastore,"mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",script_name="step1.py",runconfig = aml_run_config,arguments = ["--output_path",step1_output_data]
)

step2 = PythonScriptStep(
    name="read_pipeline_data",script_name="step2.py",compute_target=compute,arguments = ["--pd",step1_output_data.as_input]

)

pipeline = Pipeline(workspace=ws,steps=[step1,step2])

但是当我访问 step2.py 中的 pd 参数时，它提供了

">"

知道如何通过 step1 使用的 blob 存储位置来写入 step2 中的数据吗？

解决方法

您可能会在这里找到您需要的东西：https://docs.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines。特别要注意 Read OutputFileDatasetConfig as inputs to non-initial steps 部分：

# get adls gen 2 datastore already registered with the workspace
datastore = workspace.datastores['my_adlsgen2']
step1_output_data = OutputFileDatasetConfig(name="processed_data",destination=(datastore,"mypath/{run-id}/{output-name}")).as_upload()

step1 = PythonScriptStep(
    name="generate_data",script_name="step1.py",runconfig = aml_run_config,arguments = ["--output_path",step1_output_data]
    )

step2 = PythonScriptStep(
    name="read_pipeline_data",script_name="step2.py",compute_target=compute,arguments = ["--pd",step1_output_data.as_input()]
    )

pipeline = Pipeline(workspace=ws,steps=[step1,step2])

您的错误可能是 OutputFileDatasetConfig 有一个方法 as_input() 但没有属性。

azureml azureml-python-sdk outputfiledatasetconfig