GCP Vertex AI Pipeline 在构建端点错误期间失败

问题描述

我使用 AutoML 组件和自定义 Kubeflow 组件的组合部署了一个自定义 Kubeflow 管道。

当我部署管道时,它失败并出现以下错误

textPayload: "The replica workerpool0-0 exited with a non-zero status of 1. Termination reason: 
 Error. To find out more about why your job exited please check the logs: 
 https://console.cloud.google.com/logs/viewer? project=205438435937&resource=ml_job%2Fjob_id%XXXXXXXXXXXXXXXX&advancedFilter=resource.type%3D 
 %22ml_job%22%0Aresource.labels.job_id%3D%XXXXXXXXXXXXXXXXXXXX%22"
insertId: "ibt166bgd"
resource: {
 type: "ml_job"
  labels: {
   job_id: "XXXXXXXXXXXXXXXXXX"
   task_name: "service"
   project_id: "XXXXXXX-XXXXXX"
  }
 }
 timestamp: "2021-06-10T12:18:53.807150835Z"
 severity: "ERROR"
 labels: {
  ml.googleapis.com/endpoint: ""
 }
 logName: "projects/XXXXXXX-XXXXXX/logs/ml.googleapis.com%XXXXXXXXXXXXXXXXXXXX"
 receiveTimestamp: "2021-06-10T12:18:55.087983509Z"
}

这是我的管道配置:

# Kubeflow pipline defined by a Python function
@kfp.dsl.pipeline(
    name="sales-prediction-iowa",pipeline_root=pipeline_root_path)
def pipeline(project_id: str):
    pre_process = preprocess(
        project_id=project_id,)

    create_dataset = gcc_aip.TabularDatasetCreateOp(
    project=project_id,display_name=display_name,# gcs_source="gs://vertex-ai-pipeline-bucket/iowa-2020_pre-processed.csv"
    gcs_source=pre_process.output
    )


    training_job_run_op = gcc_aip.AutoMLTabularTrainingJobRunop(
        project=project_id,display_name="training-iowa-sales",optimization_prediction_type="regression",dataset=create_dataset.outputs["dataset"],model_display_name="iowa-sales-model",target_column="sale_dollars",training_fraction_split=0.8,validation_fraction_split=0.1,test_fraction_split=0.1,budget_milli_node_hours=8000,)

    endpoint_op = gcc_aip.ModelDeployOp(
        project=project_id,model=training_job_run_op.outputs.model
    )


compiler.Compiler().compile(pipeline_func=pipeline,package_path='iowa-pipeline-job.json')

api_client = AIPlatformClient(project_id=project_id,region=region)

response = api_client.create_run_from_job_spec(
    'iowa-pipeline-job.json',pipeline_root=pipeline_root_path,service_account=service_account,parameter_values={
        'project_id': project_id,# 'region': region,# 'pipeline_root_path': pipeline_root_path,# 'service_account': service_account,# 'display_name': display_name
    }
)

我偷偷怀疑它可能与地区有关,但如果她还有其他事情,请告诉我。

提前致谢!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)