未考虑 make_regression() 的 n_informative 参数为什么？

问题描述

我正在研究支持向量机算法 (SVR)。只需更改 random_state 参数， random_state 就会产生截然不同的结果。当我使用 random_state = 12 时，它根据我们最初指定为 2 的 n_informative 参数值生成值。

# Code with random_state = 12
from sklearn.datasets import make_regression

# Creating two arrays using the 'make_regression()' function 
reg_feat,reg_target = make_regression(n_samples = 30,n_features = 5,n_informative = 2,n_targets = 1,random_state = 12)

#creating empty dictionary
data_dict = {}

# Creating for loop to add the features (keys) and data in the dictionary
for i in range(reg_feat.shape[1]):
  data_dict["feature " + str(i + 1)] = reg_feat[:,i]

# Add the target key data in the dictionary
data_dict["target"] = reg_target

#Creating a dataframe from a dictionary
reg_feat_df = pd.DataFrame(data = data_dict)

# Checking for co - relation
print(reg_feat_df.corr())

输出

          feature 1  feature 2  feature 3  feature 4  feature 5    target
feature 1   1.000000   0.035793   0.413027   0.170365  -0.189310  0.917164
feature 2   0.035793   1.000000  -0.085540  -0.266076  -0.097623 -0.011071
feature 3   0.413027  -0.085540   1.000000  -0.014338   0.235447  0.741744
feature 4   0.170365  -0.266076  -0.014338   1.000000  -0.102619  0.119188
feature 5  -0.189310  -0.097623   0.235447  -0.102619   1.000000 -0.036388
target      0.917164  -0.011071   0.741744   0.119188  -0.036388  1.000000

但是当我开始尝试改变 random_state 参数值时到 95 / 72 / 75，它不尊重 n_informative 参数，只给了我一个高度相关的特征，而我给了 n_informative = 2。

# random_state = 72
from sklearn.datasets import make_regression

# Creating two arrays using the 'make_regression()' function 
reg_feat,random_state = 72)

#creating empty dictionary
data_dict = {}

# Creating for loop to add the features (keys) and data in the dictionary
for i in range(reg_feat.shape[1]):
  data_dict["feature " + str(i + 1)] = reg_feat[:,i]

# Add the target key data in the dictionary
data_dict["target"] = reg_target

#Creating a dataframe from a dictionary
reg_feat_df = pd.DataFrame(data = data_dict)

# Checking for co - relation
print(reg_feat_df.corr())

输出：

           feature 1  feature 2  feature 3  feature 4  feature 5    target
feature 1   1.000000   0.145218   0.013431  -0.065064   0.231296  0.098015
feature 2   0.145218   1.000000  -0.158852  -0.162734   0.170550 -0.138638
feature 3   0.013431  -0.158852   1.000000  -0.264027   0.039458 -0.261125
feature 4  -0.065064  -0.162734  -0.264027   1.000000  -0.399130  0.986699
feature 5   0.231296   0.170550   0.039458  -0.399130   1.000000 -0.360373
target      0.098015  -0.138638  -0.261125   0.986699  -0.360373  1.000000

那么某些 random_state 是如何给出正确结果的，而有些则没有？？还是我错过了什么？？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

data-analysis machine-learning python regression svm svm