UndefinedMetricWarning:精度定义不明确,由于没有预测样本,因此设置为0.0

问题描述

我正在不平衡数据集上运行线性SVC分类器。目标变量是二进制,我对少数类进行了上采样。这是代码

#Separate input features and target
y = df.tickets_class
X = df.drop('tickets_class',axis = 1)

#Setting up testing and training sets
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=27)

X = pd.concat([X_train,y_train],axis = 1)

no_tickets = X[X.tickets_class == 0]
tickets = X[X.tickets_class == 1]

# upsample minority
tickets_upsampled = resample(tickets,replace=True,n_samples=len(no_tickets),random_state=27)

# combine majority and upsampled minority
upsampled = pd.concat([no_tickets,tickets_upsampled])

此后,我使用上采样数据定义新的X_train和y_train:

y_train = upsampled.tickets_class
X_train = upsampled.drop('tickets_class',axis = 1)

然后,我以非常简单的方式运行Linear SVC:

clf = svm.LinearSVC(max_iter = 10000,dual = False)
clf.fit(X_train,y_train)
clf_pred = clf.predict(X_test)

最后,当我绘制模型结果时,出现此错误

 UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. 
 Use `zero_division` parameter to control this behavior. _warn_prf(average,modifier,msg_start,len(result))

我知道我会收到此错误,因为该模型仅预测目标变量中的零。但是我的问题是:如果在上采样后两个类都具有相同数量的样本,那怎么可能?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)