达拉斯分布式调度程序-错误-无法收集密钥

问题描述

import joblib

from sklearn.externals.joblib import parallel_backend
with joblib.parallel_backend('dask'):
 
    from dask_ml.model_selection import gridsearchcv
    import xgboost
    from xgboost import XGBRegressor
    grid_search = gridsearchcv(estimator= XGBRegressor(),param_grid = param_grid,cv = 3,n_jobs = -1)
    grid_search.fit(df2,df3)

我使用两台本地计算机创建了一个dask集群

client = dask.distributed.client('tcp://191.xxx.xx.xxx:8786')

我正在尝试使用dask gridsearchcv查找最佳参数。我遇到以下错误

istributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',1202,2)": ['tcp://127.0.0.1:3738']} state: ['processing'] workers: ['tcp://127.0.0.1:3738']
nonetype: None
distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:3738'],('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',2)
nonetype: None
distributed.client - WARNING - Couldn't gather 1 keys,rescheduling {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',2)": ('tcp://127.0.0.1:3738',)}
distributed.nanny - WARNING - Restarting worker
distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',1,2)": ['tcp://127.0.0.1:3730']} state: ['processing'] workers: ['tcp://127.0.0.1:3730']
nonetype: None
distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',1)": ['tcp://127.0.0.1:3730'],"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',5,1)": ['tcp://127.0.0.1:3729'],4,2)": ['tcp://127.0.0.1:3729'],2,1)": ['tcp://127.0.0.1:3730']} state: ['processing','processing','processing'] workers: ['tcp://127.0.0.1:3730','tcp://127.0.0.1:3729']
nonetype: None
distributed.scheduler - ERROR - Couldn't gather keys {'cv-n-samples-7cb7087b3aff75a31f487cfe5a9cedb0': ['tcp://127.0.0.1:3729']} state: ['processing'] workers: ['tcp://127.0.0.1:3729']
nonetype: None
distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',0)": ['tcp://127.0.0.1:3729'],0)": ['tcp://127.0.0.1:3729']} state: ['processing','processing'] workers: ['tcp://127.0.0.1:3729']
nonetype: None
distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0',2)": ['tcp://127.0.0.1:3729']} state: ['processing','processing'] workers: ['tcp://127.0.0.1:3729']
nonetype: None
distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:3730'],2)
nonetype: None

我希望有人能帮助解决这个问题。预先感谢。

解决方法

当我厌倦了在 ec2 实例上本地运行 dask 时,我遇到了同样的问题。为了解决它,我使用了:

from distributed import Client
from dask import config
config.set({'interface': 'lo'}) #<---found out to use 'lo' by running ifconfig in shell
client = Client()

这个问题帮我找到了解决方案:https://github.com/dask/distributed/issues/1281