问题描述
如果我想将二阶多项式合并到我的逻辑模型(它有两个预测变量)中,例如我在下面尝试过:
df_poly = df[['Y','x0','x1']].copy()
X_train,X_test,Y_train,Y_test = train_test_split(df_poly.drop('Y',axis=1),df_poly['Y'],test_size=0.20,random_state=10)
poly = polynomialFeatures(degree = 2,interaction_only=False,include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly),('logistic_regression',lr)])
pipe.fit(X_train,Y_train)
我会得到 x0、x1、x0^2、x1^2、x0*x1 的系数。
相反,我想调整这个过程,所以我只适合 x0、x1、x0^2 和 x0*x1。也就是说,我想消除 x1^2 项的可能性。有没有办法通过 sklearn 库来做到这一点?
解决方法
我会结合使用 ColumnTransformer
、PolynomialFeatures
和 FunctionTransformer
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures,FunctionTransformer
X = pd.DataFrame({'X0': np.arange(10),'X1': np.arange(10,20)})
ct = ColumnTransformer([
('poly_X0X1',PolynomialFeatures(degree = 2,interaction_only=True,include_bias=False),['X0','X1']),('poly_x0',FunctionTransformer(func=lambda x: x**2),['X0']),]
)
poly = ct.fit_transform(X)
poly # X0,X1,X0*X1,X0^2
array([[ 0.,10.,0.,0.],[ 1.,11.,1.],[ 2.,12.,24.,4.],[ 3.,13.,39.,9.],[ 4.,14.,56.,16.],[ 5.,15.,75.,25.],[ 6.,16.,96.,36.],[ 7.,17.,119.,49.],[ 8.,18.,144.,64.],[ 9.,19.,171.,81.]])