问题描述
我是 StackOverflow 社区的新用户,感谢您的帮助。 这是我面临的情况: 我有一个 model.py 文件,负责使用 sklearn 的 RandomizedSearchCV 训练 LightGBMRegressor 模型。训练后,我用泡菜保存模型。
n_estimators = [int(x) for x in np.linspace(start = 200,stop = 4000,num = 20)]
max_depth = [int(x) for x in np.linspace(10,100,num = 10)]
num_leaves = [int(x) for x in np.linspace(10,150,num = 10)]
learning_rate = [0.03,0.05,0.1,0.2,0.3]
subsample_for_bin = [100000,200000,300000,400000]
random_grid = {'n_estimators': n_estimators,'max_depth': max_depth,'num_leaves': num_leaves,'learning_rate': learning_rate,'subsample_for_bin': subsample_for_bin}
gbm = lgb.LGBMRegressor()
gbm_random = RandomizedSearchCV(estimator = gbm,param_distributions = random_grid,scoring=['neg_mean_absolute_error','neg_root_mean_squared_error'],refit= 'neg_root_mean_squared_error',n_iter = 100,cv = 4,verbose = 2,random_state = 42,n_jobs = -1)
gbm_random.fit(data_base[features_x],data_base[target_y])
pkl_filename = "../output/lightGBM[3].pkl"
with open(pkl_filename,'wb') as file:
pickle.dump(gbm_random,file)
为了验证训练,我使用 pickle 在 predict.py 文件中加载模型并提交测试集。
data_base_test = pd.read_csv("../output/table_test3.csv")
pkl_filename = "../output/lightGBM[3].pkl"
with open(pkl_filename,'rb') as file:
gbm = pickle.load(file)
predict_test = gbm.predict(data_base_test[features_x])
print(predict_test)
predict_test 是:
[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]
我对机器学习没问题,但在网络开发领域完全是新手。当我使用 Flask 创建 Web 开发,在路由上加载模型并尝试从与之前脚本相同的测试集进行预测时,模型中的所有预测都具有相同的值 = 66。我会面临什么问题? 注意:get_json 以json格式接收整个测试集
pkl_filename = "model/lightGBM[3].pkl"
with open(pkl_filename,'rb') as file:
gbm = pickle.load(file)
app = flask.Flask(__name__,template_folder='templates')
@app.route('/predict',methods=['POST'])
def main():
test_json = request.get_json()
df_json = pd.read_json(test_json,orient='records')
columns_name = df_json.columns.values
columns_name = np.delete(columns_name,np.where('qtde_venda'))
features_x = columns_name.tolist()
#prediction
predict = gbm.predict(df_json[features_x])
print(predict)
return(flask.render_template('main.html'))
if __name__ == '__main__':
app.run()
预测向量为:
[66. 66. 66. ... 66. 66. 66.]
[0.66487458 0.82479892 1.89628195 ... 3.83358101 5.21799368 0.33858825]
[66. 66. 66. ... 66. 66. 66.]
解决方法
我不知道如何解释发生的事情,但导致错误的是 anaconda 环境。为了解决这个问题,我删除了 anaconda 并开始使用 Python Venv