在Gridsearch之后显示所选功能

问题描述

我正在使用GridSearchCV进行线性回归的特征选择（SelectKBest）。结果显示选择了10个功能（使用.best_params_），但是我不确定如何显示这些功能。

代码粘贴在下面。我正在使用管道，因为下一个模型也将需要选择超参数。由于数据限制，x_train是一个具有12列的数据框，我无法共享。

cv_folds = KFold(n_splits=5,shuffle=False)
steps = [('feature_selection',SelectKBest(mutual_info_regression,k=3)),('regr',LinearRegression())]
pipe = Pipeline(steps)

search_space = [{'feature_selection__k': [1,2,3,4,5,6,7,8,9,10,11,12]}]

clf = GridSearchCV(pipe,search_space,scoring='neg_mean_squared_error',cv=5,verbose=0)
clf = clf.fit(x_train,y_train)

print(clf.best_params_)

解决方法

您可以像这样访问有关feature_selection步骤的信息：

<GridSearch_model_variable>.best_estimater_.named_steps[<feature_selection_step>]

因此，在您的情况下，将是这样：

print(clf.best_estimator_.named_steps['feature_selection'])
#Output: SelectKBest(k=8,score_func=<function mutual_info_regression at 0x13d37b430>)

接下来，您可以使用get_support函数来获取所选功能的布尔图：

print(clf.best_estimator_.named_steps['feature_selection'].get_support())
# Output: array([ True,False,True,True])

现在在原始列上提供此地图：

data_columns = X.columns # List of columns in your dataset

# This is the original list of columns
print(data_columns)
# Output: ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT']

# Now print the select columns
print(data_columns[clf.best_estimator_.named_steps['feature_selection'].get_support()])
# Output: ['CRIM','LSTAT']

因此，您可以看到13个特征中只有8个被选中（在我的数据中k = 4是最好的情况）

这是波士顿数据集的完整代码：

import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import KFold
from sklearn.feature_selection import SelectKBest,mutual_info_regression
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

boston_dataset = load_boston()
X = pd.DataFrame(boston_dataset.data,columns=boston_dataset.feature_names)
y = boston_dataset.target

cv_folds = KFold(n_splits=5,shuffle=False)
steps = [('feature_selection',SelectKBest(mutual_info_regression,k=3)),('regr',LinearRegression())]

pipe = Pipeline(steps)

search_space = [{'feature_selection__k': [1,2,3,4,5,6,7,8,9,10,11,12]}]

clf = GridSearchCV(pipe,search_space,scoring='neg_mean_squared_error',cv=5,verbose=0)
clf = clf.fit(X,y)

print(clf.best_params_)

data_columns = X.columns
selected_features = data_columns[clf.best_estimator_.named_steps['feature_selection'].get_support()]

print(selected_features)
# Output : Index(['CRIM','LSTAT'],dtype='object')

参考：

cross-validation feature-selection grid-search gridsearchcv

在Gridsearch之后显示所选功能

问题描述

解决方法

相关问答