scikit学习test_data_split:ValueError:找到输入样本数量不一致的输入变量:[4999,5000]

问题描述

这是我的代码

print(len(image_dataset.data))
print(len(phylum_target))
X_train,X_test,y_train,y_test = train_test_split(image_dataset.data,phylum_target,test_size=0.2,random_state=109)

这是输出错误

5000
5000
Traceback (most recent call last):
  File "Image_SVM_run_only.py",line 298,in <module>
    X_train_temp,X_test_temp,y_train_temp,y_test_temp = train_test_split(image_dataset.data,random_state=109)
  File "/root/anaconda3/envs/IBC/lib/python3.7/site-packages/sklearn/model_selection/_split.py",line 2127,in train_test_split
    arrays = indexable(*arrays)
  File "/root/anaconda3/envs/IBC/lib/python3.7/site-packages/sklearn/utils/validation.py",line 293,in indexable
    check_consistent_length(*result)
  File "/root/anaconda3/envs/IBC/lib/python3.7/site-packages/sklearn/utils/validation.py",line 257,in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [4999,5000]

即使训练数据和测试数据具有相同的长度,我仍然遇到此错误。 请帮我T.T

解决方法

这是我可以从您的信息中识别出的最小可复制示例,并且效果很好

import numpy as np
from sklearn.model_selection import train_test_split

X = np.zeros((5000,49152))
y = np.zeros((5000,1))
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=109)
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)