将模型部署到 AI Platform Prediction 上的区域终端节点成功，但将同一模型部署到全球终端节点失败

问题描述

我在 Cloud Storage 中保存了一个 scikit-learn 模型，我正尝试使用 AI Platform Prediction 进行部署。当我将此模型部署到区域端点时，部署成功完成：

➜ gcloud ai-platform versions describe regional_endpoint_version --model=regional --region us-central1
Using endpoint [https://us-central1-ml.googleapis.com/]
autoScaling:
  minNodes: 1
createTime: '2020-12-30T15:21:55Z'
deploymentUri: <REMOVED>
description: testing deployment to a regional endpoint
etag: <REMOVED>
framework: SCIKIT_LEARN
isDefault: true
machineType: n1-standard-4
name: <REMOVED>
pythonVersion: '3.7'
runtimeVersion: '2.2'
state: READY

但是，当我尝试使用相同的 Python/运行时版本将完全相同的模型部署到全局端点时，部署失败，说加载模型时出错：

(aiz) ➜  stanford_nlp_a3 gcloud ai-platform versions describe public_object --model=global
Using endpoint [https://ml.googleapis.com/]
autoScaling: {}
createTime: '2020-12-30T15:12:11Z'
deploymentUri: <REMOVED>
description: testing global endpoint deployment
errorMessage: 'Create Version Failed. Bad model detected with error:  "Error loading
  the model"'
etag: <REMOVED>
framework: SCIKIT_LEARN
machineType: mls1-c1-m2
name: <REMOVED>
pythonVersion: '3.7'
runtimeVersion: '2.2'
state: Failed

我尝试将 .joblib 对象设为公开，以确保在尝试部署到导致问题的两个端点时没有权限差异，但部署到全局端点仍然失败。我从帖子中删除了 deploymentUri，因为我一直在试验这个模型对象的权限，但在两个不同的模型版本中路径是相同的。

两个部署的机器类型必须不同，对于区域部署我使用 min nodes = 1 而对于全局我可以使用 min nodes = 0，但除此之外，etags 其他一切都完全相同.

我在 AI Platform Prediction regional endpoints docs page 中找不到任何表明某些模型只能部署到某种类型的端点的信息。 “加载模型时出错”错误消息并没有让我继续下去，因为它似乎不是模型文件的权限问题。

当我将 --log-http 选项添加到 create version 命令时，我看到错误代码为 3，但该消息没有显示任何其他信息：

➜  ~ gcloud ai-platform versions create $VERSION_NAME \
  --model=$MODEL_NAME \
  --origin=$MODEL_DIR \
  --runtime-version=2.2 \
  --framework=$FRAMEWORK \
  --python-version=3.7 \
  --machine-type=mls1-c1-m2 --log-http

Using endpoint [https://ml.googleapis.com/]
=======================
==== request start ====
...
...
the final response from the server looks like this:
---- response start ----
status: 200
-- headers start --
<headers>
-- headers end --
-- body start --
{
  "name": "<name>","Metadata": {
    "@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata","createTime": "2020-12-30T22:53:30Z","startTime": "2020-12-30T22:53:30Z","endTime": "2020-12-30T22:54:37Z","operationType": "CREATE_VERSION","modelName": "<name>","version": {
      <version info>
    }
  },"done": true,"error": {
    "code": 3,"message": "Create Version Failed. Bad model detected with error:  \"Error loading the model\""
  }
}

-- body end --
total round trip time (request+response): 0.096 secs
---- response end ----
----------------------
Creating version (this might take a few minutes)......Failed.
ERROR: (gcloud.ai-platform.versions.create) Create Version Failed. Bad model detected with error:  "Error loading the model"

谁能解释一下我在这里遗漏了什么？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

google-ai-platform google-cloud-ml google-cloud-platform