问题描述
我正在不平衡数据集上运行线性SVC分类器。目标变量是二进制,我对少数类进行了上采样。这是代码:
#Separate input features and target
y = df.tickets_class
X = df.drop('tickets_class',axis = 1)
#Setting up testing and training sets
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=27)
X = pd.concat([X_train,y_train],axis = 1)
no_tickets = X[X.tickets_class == 0]
tickets = X[X.tickets_class == 1]
# upsample minority
tickets_upsampled = resample(tickets,replace=True,n_samples=len(no_tickets),random_state=27)
# combine majority and upsampled minority
upsampled = pd.concat([no_tickets,tickets_upsampled])
此后,我使用上采样数据定义新的X_train和y_train:
y_train = upsampled.tickets_class
X_train = upsampled.drop('tickets_class',axis = 1)
然后,我以非常简单的方式运行Linear SVC:
clf = svm.LinearSVC(max_iter = 10000,dual = False)
clf.fit(X_train,y_train)
clf_pred = clf.predict(X_test)
最后,当我绘制模型结果时,出现此错误:
UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples.
Use `zero_division` parameter to control this behavior. _warn_prf(average,modifier,msg_start,len(result))
我知道我会收到此错误,因为该模型仅预测目标变量中的零。但是我的问题是:如果在上采样后两个类都具有相同数量的样本,那怎么可能?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)