将模型保存到AMLS中的临时数据存储时,joblib.dump失败

问题描述

我正在使用AMLS训练模型。我有一个训练管道,其中第1步训练模型,然后使用

输出保存在临时数据存储区model_folder中
os.makedirs(output_folder,exist_ok=True)
output_path = output_folder + "/model.pkl"
joblib.dump(value=model,filename=output_path)

第2步加载模型并注册。模型文件夹在管道中定义为

model_folder = PipelineData("model_folder",datastore=ws.get_default_datastore())

但是,步骤1尝试使用以下ServiceError保存模型时失败:

由于异常而无法上载输出:Microsoft.RelInfra.Common.Exceptions.OperationFailedException:无法上载输出xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx。 ---> Microsoft.WindowsAzure.Storage.StorageException:无权使用此权限执行此操作。

我该如何解决?在我的代码前面,我可以使用

认数据存储区进行交互
default_ds = ws.get_default_datastore()
default_ds.upload_files(...)

我的70_driver_log.txt如下:

[2020-08-25T04:03:27.315114] Entering context manager injector.
[context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath','RunHistory:context_managers.RunHistory','TrackUserError:context_managers.TrackUserError'],invocation=['train_word2vec.py','--output_folder','/mnt/batch/tasks/shared/LS_root/jobs/aiworkspace/azureml/xxxxx/mounts/workspaceblobstore/azureml/xxxxx/model_folder','--model_type','WO','--training_field','task_title','--regex','1','--stopword_removal','--tokenize_basic','0','--remove_punctuation','--autocorrect','--lemmatization','--word_vector_length','152','--model_learning_rate','0.025','--model_min_count','--model_window','7','--num_epochs','10'])
Starting the daemon thread to refresh tokens in background for process with pid = 113
Entering Run History Context Manager.
Current directory:  /mnt/batch/tasks/shared/LS_root/jobs/aiworkspace/azureml/xxxxx/mounts/workspaceblobstore/azureml/xxxxx
Preparing to call script [ train_word2vec.py ] with arguments: ['--output_folder','10']
After variable expansion,calling script [ train_word2vec.py ] with arguments: ['--output_folder','10']

Script type = None
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
OUTPUT FOLDER: /mnt/batch/tasks/shared/LS_root/jobs/aiworkspace/azureml/xxxxx/mounts/workspaceblobstore/azureml/xxxxx/model_folder
Loading sql data...
Loading abbreviation data...
/azureml-envs/azureml_xxxxx/lib/python3.6/site-packages/pandas/core/indexing.py:1783: SettingWithcopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item_labels[indexer[info_axis]]] = value
Pre-processing data...
Succesfully pre-processed the the text data
Training Word2Vec model...
Saving the model...
Starting the daemon thread to refresh tokens in background for process with pid = 113


The experiment completed successfully. Finalizing run...
[2020-08-25T04:03:52.293994] TimeoutHandler __init__
[2020-08-25T04:03:52.294149] TimeoutHandler __enter__
Cleaning up all outstanding Run operations,waiting 300.0 seconds
2 items cleaning up...
Cleanup took 0.44109439849853516 seconds
[2020-08-25T04:03:52.818991] TimeoutHandler __exit__
2020/08/25 04:04:00 logger.go:293: Process Exiting with Code:  0

我的arg解析参数包括

parser.add_argument('--output_folder',type=str,dest='output_folder',default="output_folder",help='output folder')

解决方法

一些想法:

  1. 这是@drum所建议的,并且是权限错误。
  2. 您的os.path.join(output_folder,'model.pkl')的打字错误
  3. 如果使用{{1}},是否会发生相同的错误?
,

通过将我的AMLS工作区添加到AMLS默认存储帐户中的“存储Blob数据贡献者”角色来解决此问题。看来,通常默认情况下会添加此角色,但就我而言,这没有发生。