Azure ML推理架构-“列表索引超出范围”错误

问题描述

我已经在Azure ML Studio上部署了ML模型,并且我正在使用推理模式对其进行更新,以允许与Power BI兼容,如here所述。

通过REST api将数据发送到模型时(添加此推理模式之前),一切正常,我得到了返回的结果。但是,一旦按照上面链接的说明添加了架构并个性化了我的数据,通过REST api发送的相同数据只会返回错误“列表索引超出范围”。部署进行得很好,被指定为“正常”,没有错误消息。

任何帮助将不胜感激。谢谢。

编辑:

输入脚本:

 import numpy as np
 import pandas as pd
 import joblib
 from azureml.core.model import Model
    
 from inference_schema.schema_decorators import input_schema,output_schema
 from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType
 from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
 from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType
    
 def init():
     global model
     #Model name is the name of the model registered under the workspace
     model_path = Model.get_model_path(model_name = 'databricksmodelpowerbi2')
     model = joblib.load(model_path)
    
 #Provide 3 sample inputs for schema generation for 2 rows of data
 numpy_sample_input = NumpyParameterType(np.array([[2400.0,78.26086956521739,11100.0,3.612565445026178,3.0,0.0],[368.55,96.88311688311687,709681.1600000012,73.88059701492537,44.0,0.0]],dtype = 'float64'))
 pandas_sample_input = PandasParameterType(pd.DataFrame({'value': [2400.0,368.55],'delayed_percent': [78.26086956521739,96.88311688311687],'total_value_delayed': [11100.0,709681.1600000012],'num_invoices_per30_dealing_days': [3.612565445026178,73.88059701492537],'delayed_streak': [3.0,44.0],'prompt_streak': [0.0,0.0]}))
 standard_sample_input = StandardPythonParameterType(0.0)
    
 # This is a nested input sample,any item wrapped by `ParameterType` will be described by schema
 sample_input = StandardPythonParameterType({'input1': numpy_sample_input,'input2': pandas_sample_input,'input3': standard_sample_input})
    
 sample_global_parameters = StandardPythonParameterType(1.0) #this is optional
 sample_output = StandardPythonParameterType([1.0,1.0])
    
 @input_schema('inputs',sample_input)
 @input_schema('global_parameters',sample_global_parameters) #this is optional
 @output_schema(sample_output)
    
 def run(inputs,global_parameters):
     try:
         data = inputs['input1']
         # data will be convert to target format
         assert isinstance(data,np.ndarray)
         result = model.predict(data)
         return result.tolist()
     except Exception as e:
         error = str(e)
         return error

预测脚本:

 import requests
 import json
 from ast import literal_eval
    
 # URL for the web service
 scoring_uri = ''
 ## If the service is authenticated,set the key or token
 #key = '<your key or token>'
    
 # Two sets of data to score,so we get two results back
 data = {"data": [[2400.0,0.0]]}
 # Convert to JSON string
 input_data = json.dumps(data)
    
 # Set the content type
 headers = {'Content-Type': 'application/json'}
 ## If authentication is enabled,set the authorization header
 #headers['Authorization'] = f'Bearer {key}'
    
 # Make the request and display the response
 resp = requests.post(scoring_uri,input_data,headers=headers)
 print(resp.text)
    
 result = literal_eval(resp.text)

解决方法

我不确定您是否已解决问题,但是我遇到了类似的问题,而且我无法让Power BI看到我的ML模型。最后,我使用以下模式专门为Power BI(pandas df类型)创建了一个服务:

unordered scan
,

Microsoft documentation 说:“为了生成符合自动 Web 服务消费的 swagger,评分脚本 run() 函数必须具有以下 API 形状:

类型为“StandardPythonParameterType”的第一个参数,名为 输入和嵌套。

一个可选的“StandardPythonParameterType”类型的第二个参数, 名为 GlobalParameters。

返回名为“StandardPythonParameterType”类型的字典 结果和嵌套。”

我已经测试过了,它区分大小写 所以它会是这样的:

import numpy as np
import pandas as pd
import joblib

from azureml.core.model import Model
from inference_schema.schema_decorators import input_schema,output_schema
from inference_schema.parameter_types.standard_py_parameter_type import 
    StandardPythonParameterType
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType

def init():
    global model
    # Model name is the name of the model registered under the workspace
    model_path = Model.get_model_path(model_name = 'databricksmodelpowerbi2')
    model = joblib.load(model_path)

# Provide 3 sample inputs for schema generation for 2 rows of data
numpy_sample_input = NumpyParameterType(np.array([[2400.0,78.26086956521739,11100.0,3.612565445026178,3.0,0.0],[368.55,96.88311688311687,709681.1600000012,73.88059701492537,44.0,0.0]],dtype = 'float64'))

pandas_sample_input = PandasParameterType(pd.DataFrame({'value': [2400.0,368.55],'delayed_percent': [78.26086956521739,96.88311688311687],'total_value_delayed': 
[11100.0,709681.1600000012],'num_invoices_per30_dealing_days': [3.612565445026178,73.88059701492537],'delayed_streak': [3.0,44.0],'prompt_streak': [0.0,0.0]}))

standard_sample_input = StandardPythonParameterType(0.0)

# This is a nested input sample,any item wrapped by `ParameterType` will be described 
by schema
sample_input = StandardPythonParameterType({'input1': numpy_sample_input,'input2': pandas_sample_input,'input3': standard_sample_input})

sample_global_parameters = StandardPythonParameterType(1.0) #this is optional

numpy_sample_output = NumpyParameterType(np.array([1.0,2.0]))

# 'Results' is case sensitive
sample_output = StandardPythonParameterType({'Results': numpy_sample_output})

# 'Inputs' is case sensitive
@input_schema('Inputs',sample_input)
@input_schema('global_parameters',sample_global_parameters) #this is optional
@output_schema(sample_output)
def run(Inputs,global_parameters):
    try:
        data = inputs['input1']
        # data will be convert to target format
        assert isinstance(data,np.ndarray)
        result = model.predict(data)
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

`