问题描述
我使用 AutoML 组件和自定义 Kubeflow 组件的组合部署了一个自定义 Kubeflow 管道。
当我部署管道时,它失败并出现以下错误:
textPayload: "The replica workerpool0-0 exited with a non-zero status of 1. Termination reason:
Error. To find out more about why your job exited please check the logs:
https://console.cloud.google.com/logs/viewer? project=205438435937&resource=ml_job%2Fjob_id%XXXXXXXXXXXXXXXX&advancedFilter=resource.type%3D
%22ml_job%22%0Aresource.labels.job_id%3D%XXXXXXXXXXXXXXXXXXXX%22"
insertId: "ibt166bgd"
resource: {
type: "ml_job"
labels: {
job_id: "XXXXXXXXXXXXXXXXXX"
task_name: "service"
project_id: "XXXXXXX-XXXXXX"
}
}
timestamp: "2021-06-10T12:18:53.807150835Z"
severity: "ERROR"
labels: {
ml.googleapis.com/endpoint: ""
}
logName: "projects/XXXXXXX-XXXXXX/logs/ml.googleapis.com%XXXXXXXXXXXXXXXXXXXX"
receiveTimestamp: "2021-06-10T12:18:55.087983509Z"
}
这是我的管道配置:
# Kubeflow pipline defined by a Python function
@kfp.dsl.pipeline(
name="sales-prediction-iowa",pipeline_root=pipeline_root_path)
def pipeline(project_id: str):
pre_process = preprocess(
project_id=project_id,)
create_dataset = gcc_aip.TabularDatasetCreateOp(
project=project_id,display_name=display_name,# gcs_source="gs://vertex-ai-pipeline-bucket/iowa-2020_pre-processed.csv"
gcs_source=pre_process.output
)
training_job_run_op = gcc_aip.AutoMLTabularTrainingJobRunop(
project=project_id,display_name="training-iowa-sales",optimization_prediction_type="regression",dataset=create_dataset.outputs["dataset"],model_display_name="iowa-sales-model",target_column="sale_dollars",training_fraction_split=0.8,validation_fraction_split=0.1,test_fraction_split=0.1,budget_milli_node_hours=8000,)
endpoint_op = gcc_aip.ModelDeployOp(
project=project_id,model=training_job_run_op.outputs.model
)
compiler.Compiler().compile(pipeline_func=pipeline,package_path='iowa-pipeline-job.json')
api_client = AIPlatformClient(project_id=project_id,region=region)
response = api_client.create_run_from_job_spec(
'iowa-pipeline-job.json',pipeline_root=pipeline_root_path,service_account=service_account,parameter_values={
'project_id': project_id,# 'region': region,# 'pipeline_root_path': pipeline_root_path,# 'service_account': service_account,# 'display_name': display_name
}
)
我偷偷怀疑它可能与地区有关,但如果她还有其他事情,请告诉我。
提前致谢!
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)