在 Python 中调整 Logistic 回归的多项式特征

问题描述

如果我想将二阶多项式合并到我的逻辑模型(它有两个预测变量)中,例如我在下面尝试过:

df_poly = df[['Y','x0','x1']].copy()
X_train,X_test,Y_train,Y_test = train_test_split(df_poly.drop('Y',axis=1),df_poly['Y'],test_size=0.20,random_state=10)

poly = polynomialFeatures(degree = 2,interaction_only=False,include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly),('logistic_regression',lr)])
pipe.fit(X_train,Y_train)

我会得到 x0、x1、x0^2、x1^2、x0*x1 的系数。

相反,我想调整这个过程,所以我只适合 x0、x1、x0^2 和 x0*x1。也就是说,我想消除 x1^2 项的可能性。有没有办法通过 sklearn 库来做到这一点?

解决方法

我会结合使用 ColumnTransformerPolynomialFeaturesFunctionTransformer

import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures,FunctionTransformer

X = pd.DataFrame({'X0': np.arange(10),'X1': np.arange(10,20)})
ct = ColumnTransformer([
    ('poly_X0X1',PolynomialFeatures(degree = 2,interaction_only=True,include_bias=False),['X0','X1']),('poly_x0',FunctionTransformer(func=lambda x: x**2),['X0']),]
)
poly = ct.fit_transform(X)
poly # X0,X1,X0*X1,X0^2
array([[  0.,10.,0.,0.],[  1.,11.,1.],[  2.,12.,24.,4.],[  3.,13.,39.,9.],[  4.,14.,56.,16.],[  5.,15.,75.,25.],[  6.,16.,96.,36.],[  7.,17.,119.,49.],[  8.,18.,144.,64.],[  9.,19.,171.,81.]])