Azure ML中的参数化SQL查询

问题描述

背景:似乎有一种使用DataPath参数化PipelineParameter方法 https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb

但是我想用PipelineParameter参数化我的SQL查询,例如,与此查询

sql_query = """
SELECT id,foo,bar FROM baz
WHERE baz.id BETWEEN 10 AND 20
"""
dataset = Dataset.Tabular.from_sql_query((sql_datastore,sql_query))

我想使用PipelineParameter将1020参数化为param_1param_2。这可能吗?

解决方法

找到了一种方法:

将参数传递给PythonScriptStep

root@ip-172-30-244-157:/home/ubuntu# 
root@ip-172-30-244-157:/home/ubuntu# python
Python 2.7.17 (default,Sep 30 2020,13:38:04) 
[GCC 7.5.0] on linux2
Type "help","copyright","credits" or "license" for more information.

>>> import faker
>>> 
>>> exit()
root@ip-172-30-244-157:/home/ubuntu# 
root@ip-172-30-244-157:/home/ubuntu# 
root@ip-172-30-244-157:/home/ubuntu# exit
exit
ubuntu@ip-172-30-244-157:~$ 
ubuntu@ip-172-30-244-157:~$ 
ubuntu@ip-172-30-244-157:~$ python
Python 2.7.17 (default,"credits" or "license" for more information.
>>> import faker
Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
ImportError: No module named faker
>>> 

在script.py

param_1 = PipelineParameter(name='min_id',default_value=5)
param_2 = PipelineParameter(name='max_id',default_value=10)
sql_datastore = "sql_datastore"
step = PythonScriptStep(script_name='script.py',arguments=[param_1,param_2,sql_datastore])