如何通过PCA降维找到最相关的变量?

问题描述

我是python编码的新手,正在从事一个项目,但仍停留在编码部分。 我有一个目标变量和23个相关变量。 我的数据集是11(简单)* 23(描述符),一个目标数据集是11(简单)* 1(目标变量)。 如何通过PCA降维从这23个描述符中找到最相关的变量?

    pca = PCA()
pca.fit(df2)
transformed = pca.transform(df2)

    from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
steps = [('pca',PCA()),('m',LogisticRegression())]
model = Pipeline(steps=steps)`

    from sklearn.preprocessing import MinMaxScaler
steps = [('norm',MinMaxScaler()),('pca',LogisticRegression())]
model = Pipeline(steps=steps)

    from sklearn.datasets import make_classification
X,y = make_classification(n_samples=11,n_features=23,n_informative=5,n_redundant=18,random_state=7)
print(X.shape,y.shape)

    from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
X,random_state=7)
steps = [('pca',PCA(n_components=4)),LogisticRegression())]
model = Pipeline(steps=steps)
cv = RepeatedStratifiedKFold(n_splits=5,n_repeats=3,random_state=1)
n_scores = cross_val_score(model,X,y,scoring='accuracy',cv=cv,n_jobs=-1,error_score='raise')
# report performance
print('Accuracy: %.3f (%.3f)' % (mean(n_scores),std(n_scores)))

enter image description here数据

enter image description here 我应该得到这张照片,但我不知道该怎么办。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)