如何在 sklearn 管道中一起使用 SMOTE 和特征选择？

问题描述

from imblearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE    
smt = SMOTE(random_state=0)

pipeline_rf_smt_fs = Pipeline(
    [
        ('preprocess',preprocessor),('selector',SelectKBest(mutual_info_classif,k=30)),('smote',smt),('rf_classifier',RandomForestClassifier(n_estimators=600,random_state =2021))
    ]
)

我收到以下错误：所有中间步骤都应该是转换器并实现拟合和转换，或者是字符串 'passthrough' 'SMOTE(random_state=0)' (type ) 不

我相信 smote 必须使用后期特征选择过程。对此的任何帮助都会非常有帮助。

解决方法

这是 scikit-learn 版本的管道给出的错误消息。您的代码按原样不应产生此错误，但您可能在覆盖 from sklearn.pipeline import Pipeline 对象的地方运行了 Pipeline。

从方法论的角度来看，我仍然发现在一般设置中进行预处理和特征选择后使用采样器是有问题的。如果您选择的特征由于数据集的不平衡而相关怎么办？我更愿意在管道的第一步中使用它（但这取决于您，它不会导致任何错误）。

feature-selection pipeline pipeline python scikit-learn smote