Azure-ML部署看不到AzureML环境版本号错误

问题描述

我已经很好地遵循了here概述的文档。

我已经通过以下方式设置了我的azure机器学习环境:

  const rotatePoint = (pivotPoint,point,angle) => {
    const { x: px,y: py } = pivotPoint;
    const { x,y } = point;

    var radians = (Math.PI / 180) * angle,cos = Math.cos(radians),sin = Math.sin(radians),nx = cos * (x - px) + sin * (y - py) + px,ny = cos * (y - py) - sin * (x - px) + py;

    return { x: nx,y: ny };
  };

我为推理配置了一个score.py文件(与我遇到的问题无关)...

然后我设置推理配置

from azureml.core import Workspace

# Connect to the workspace
ws = Workspace.from_config()

from azureml.core import Environment
from azureml.core import ContainerRegistry

myenv = Environment(name = "myenv")

myenv.inferencing_stack_version = "latest"  # This will install the inference specific apt packages.

# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..." 
myenv.docker.arguments = None

# Environment variable (I need python to look at folders 
myenv.environment_variables = {"PYTHONPATH":"/root"}

# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python" 

from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep

myenv.register(workspace=ws) # works!

我设置了我的计算集群:

from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py",environment=myenv)

一切都成功了;然后我尝试部署模型以进行推断:

from azureml.core.compute import ComputeTarget,AksCompute
from azureml.exceptions import ComputeTargetException

# Choose a name for your cluster
aks_name = "theclustername" 

# Check to see if the cluster already exists
try:
    aks_target = ComputeTarget(workspace=ws,name=aks_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")

    aks_target = ComputeTarget.create(workspace=ws,name=aks_name,provisioning_configuration=prov_config)

    aks_target.wait_for_completion(show_output=True)

from azureml.core.webservice import AksWebservice

# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,num_replicas=3,cpu_cores=4,memory_gb=10)

它并没有说它找不到环境。更具体地说,我的环境版本是 11版,但它一直在尝试查找版本号比当前环境高1(即 12版)的环境。 :

from azureml.core.model import Model

model = Model(ws,name="thenameofmymodel")

# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'

# Deploy the model
aks_service = Model.deploy(ws,aks_service_name,models=[model],inference_config=inference_config,deployment_config=gpu_aks_config,deployment_target=aks_target,overwrite=True)

aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)

我尝试手动编辑环境JSON以匹配Azureml试图获取的版本,但是没有任何效果。有人可以看到这段代码有什么问题吗?

更新

更改环境名称(例如FailedERROR - Service deployment polling reached non-successful terminal state,current service state: Failed Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4 More information can be found here: Error: { "code": "BadRequest","statusCode": 400,"message": "The request is invalid","details": [ { "code": "EnvironmentDetailsFetchFailedUserError","message": "Failed to fetch details for Environment with Name: myenv Version: 12." } ] } )并将其传递给my_inference_env似乎是正确的做法。但是,错误现在更改为以下

InferenceConfig

解决方案

以下关于蓝色天蓝色ML环境的使用,实际上是正确的。但是,我遇到的最后一个错误是因为我使用摘要值(sha)而不是图像名称和标记(例如Running.......... Failed ERROR - Service deployment polling reached non-successful terminal state,current service state: Failed Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692 More information can be found here: https://some_long_http_address_that_leads_to_nothing Error: { "code": "DeploymentFailed","statusCode": 404,"message": "Deployment not found" } )来设置容器图像。请注意第一段中的代码行:

imagename:tag

我引用了摘要值,但应将其更改为

myenv.docker.base_image = "4fb3..." 

一旦进行了更改,部署就成功了! :)

解决方法

花了我一段时间的一个概念是注册和使用Azure ML Environment的分歧。如果您已经注册了环境myenv,并且您的环境的详细信息都没有更改,则无需向myenv.register()重新注册它。您可以像这样简单地使用Environment.get()获取已经注册的环境:

myenv = Environment.get(ws,name='myenv',version=11)

我的建议是给您的环境起一个新的名称:例如"model_scoring_env"。将其注册一次,然后将其传递到InferenceConfig

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...