问题描述
我想创建一些随机数据,并尝试使用polynominalFeatures改进我的模型,但是这样做的麻烦不大。
from sklearn.preprocessing import polynomialFeatures
from sklearn.model_selection import train_test_split
import random
import pandas as pd
import numpy as np
import statsmodels.api as sm
#create some artificial data
x=np.linspace(-1,1,1000)
x=pd.array(random.choices(x,k=1000))
y=x**2+np.random.randn(1000)
#divide sample
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.5)
#define data frame to futher use for polynomialFeatures
df=pd.DataFrame([x_train,x_test])
df=df.transpose()
data = df
# perform a polynomial features transform of the dataset
trans = polynomialFeatures(degree=2)
data = trans.fit_transform(data)
model = sm.OLS(y_train,data).fit()
然后我得到了错误:ValueError: unrecognized data structures: <class 'pandas.core.arrays.numpy_.PandasArray'> / <class 'numpy.ndarray'>
您是否有什么想法可以使我的回归正常工作?
解决方法
使用to_numpy()
函数将pandas数组转换为numpy数组
model = sm.OLS(y_train.to_numpy(),data).fit()