问题描述
我有一个特征矩阵(X_train_balanced)和一个目标向量(y_train_balanced),用于分类任务(3个类)。为了执行模型选择和超参数调整,我打算在要比较的每个模型(LR,SVC,RF和KNN)上使用sklearn的GridsearchCV。
然后我的想法是比较GridsearchCV产生的每个模型的最佳结果,以选择最佳模型。
我想知道这种方法是否有意义,以及我为该任务开发的代码是否正确。
模型搜索空间
models = { 'LogisticRegression' : LogisticRegression(),'SVM' : SVC(),'RandomForestClassifier' : RandomForestClassifier(),'KNN' : KNeighborsClassifier()}
超参数搜索空间
hyper = { 'LogisticRegression':{ 'penalty' : ['l2'],'C' : np.logspace(0,4,10),'solver' : ['lbfgs','liblinear','saga'],'class_weight': ['balanced'],'random_state': [0]},'SVM':{ 'C' : [0.01,0.1,1,10,100,1000],'gamma' : [1,0.01,0.001,0.0001],'kernel' : ['rbf','linear'],'RandomForestClassifier':{ 'max_depth': [2,3,4],'max_features': [2,'auto','sqrt'],'n_estimators': [10,500,'KNN':{ 'n_neighbors': [5,15,20],'weights': ['uniform','distance']} }
对每个模型进行交叉验证
for model_name in models.keys(): # Model selection clf = models[model_name] params = hyper[model_name] # Pipeline (standarization + classifier) pipe = Pipeline([ ( 'scaler',StandardScaler() ),( 'clf',clf ) ]) # Gridsearch cross-validation grid = GridSearchCV(estimator = clf,param_grid = params,cv = 5,return_train_score = True) grid.fit(X_train_balanced,y_train_balanced) # Gridsearch cross-validation results best_param = grid.best_params_ best_param_test_score_mean = grid.cv_results_['mean_test_score'][grid.best_index_] best_param_test_score_std = grid.cv_results_['std_test_score'][grid.best_index_] best_param_train_score_mean = grid.cv_results_['mean_train_score'][grid.best_index_] best_param_train_score_std = grid.cv_results_['std_train_score'][grid.best_index_]
解决方法
只要您具有必需的导入,您的代码就可以使用:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
我想这也很有意义。
在代码末尾,您可以添加行
print(best_param)
print(grid.best_estimator_)
以获得最佳参数和最佳性能估算器。我使用这些更改运行了代码,例如,我测试过的数据集的输出是:
{'n_neighbors': 15,'weights': 'uniform'}
KNeighborsClassifier(algorithm='auto',leaf_size=30,metric='minkowski',metric_params=None,n_jobs=None,n_neighbors=15,p=2,weights='uniform')