Hyperopt：当我为 sklearn 加载保存的模型时，如何知道为最佳模型选择了哪些变量？

问题描述

我训练了一个 sklearn Gradient Boosting 分类器并使用 Hyperopt 进行了优化。 Hyperopt 仅从 769 个变量中选择了 20 个变量。但是，当我尝试为 sklearn 加载权重时，在盲测中，不清楚选择了哪些变量。代码如下：

from xgboost import XGBClassifier

from hyperopt import fmin,tpe,hp,STATUS_OK,Trials
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,f1_score,recall_score

# multi:mlogloss // binary:logistic

def accuracy(params):
    clf = XGBClassifier(**params,learning_rate=0.7,objective='binary:logistic',booster='gbtree',n_jobs=64,eval_metric="error",eval_set=eval_set,verbose=True)
    clf.fit(X_train,y_train) #eval_set=eval_set,return clf.score(X_test,y_test)

eval_set=eval_set = [(X_test,y_test)]

parameters = {
    'n_estimators': hp.choice('n_estimators',range(20,40)),'max_depth': hp.choice('max_depth',range(4,100)),'gamma': hp.choice('gamma',range(0,10)),"min_child_weight":hp.choice("min_child_weight",1)),"num_features":hp.choice("num_features",range(10,X_train.shape[1])),"max_delta_step":hp.choice("max_delta_step",10))}


best = 0
def f(params):
    global best
    acc = accuracy(params)
    if acc > best:
        best = acc
    print ('Improving:',best,params)
    return {'loss': -acc,'status': STATUS_OK}

trials = Trials()

best = fmin(f,parameters,algo=tpe.suggest,max_evals=80,trials=trials)
print ('best:',best)

clf = XGBClassifier(gamma=best['gamma'],max_delta_step=best['max_delta_step'],max_depth=best['max_depth'],learning_rate=0.1,n_estimators=best['n_estimators'],min_child_weight=best['min_child_weight'],num_features=best['num_features'],verbose=True)
clf.fit(X_train,y_train)
clf.score(X_test,y_test)

import joblib
filename = '/home/rubens.../modelos/Argumenta_Multi.sav'


joblib.dump(clf,filename)


loaded_model = joblib.load(filename)
result = loaded_model.predict(X_new)

我如何知道 hyperopt 选择了哪 20 个变量？我害怕使用卡方（选择 K best = 20）并保存 hyperopt 权重，因为 hyperopt 可能不会使用卡方作为变量选择。

在 result=loaded_model... 我收到以下错误：

ValueError: X has 769 features,but DecisionTreeClassifier is expecting 20 features as input.

我也不知道 Hyperopt 是否遵循 sklearn 的特征重要性，之前是为了保存 Hyperopt 最佳模型：

model.feature_importances_

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

hyperopt python scikit-learn

Hyperopt：当我为 sklearn 加载保存的模型时，如何知道为最佳模型选择了哪些变量？

问题描述

解决方法

相关问答