问题描述
我已提交了在远程计算上运行的autoML(Standard_D12_v2-4个节点群集28GB,每个4个内核)
我的输入文件大约为350 MB。
状态为“正在准备”超过2个小时。然后失败。
User error: Run timed out. No model completed training in the specified time. Possible solutions:
1) Please check if there are enough compute resources to run the experiment.
2) Increase experiment timeout when creating a run.
3) Subsample your dataset to decrease featurization/training time.
下面是我的python笔记本代码,请帮忙。
import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.dataset import Dataset
from azureml.core.compute import ComputeTarget
from azureml.train.automl import AutoMLConfig
ws = Workspace.from_config()
experiment=Experiment(ws,'nyc-taxi')
cpu_cluster_name = "low-cluster"
compute_target = ComputeTarget(workspace=ws,name=cpu_cluster_name)
data = "https://betaml4543906917.blob.core.windows.net/betadata/2015_08.csv"
dataset = Dataset.Tabular.from_delimited_files(data)
training_data,validation_data = dataset.random_split(percentage=0.8,seed=223)
label_column_name = 'totalAmount'
automl_settings = {
"n_cross_validations": 3,"primary_metric": 'normalized_root_mean_squared_error',"enable_early_stopping": True,"max_concurrent_iterations": 2,# This is a limit for testing purpose,please increase it as per cluster size
"experiment_timeout_hours": 2,# This is a time limit for testing purposes,remove it for real use cases,this will drastically limit ablity to find the best model possible
"verbosity": logging.INFO,}
automl_config = AutoMLConfig(task = 'regression',debug_log = 'automl_errors.log',compute_target = compute_target,training_data = training_data,label_column_name = label_column_name,**automl_settings
)
remote_run = experiment.submit(automl_config,show_output = False)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)