处理分类中的不平衡数据集

问题描述

我有一个基于会计欺诈的大型数据框,我想解决数据不平衡的问题。

首先,我将数据框拆分为 2 个:X(变量)和 y(目标,即:欺诈或不欺诈)

我试过了:

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.combine import SMOTEENN

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler

X = df[['fyear','gvkey','sich','insbnk','understatement','option','p_aaer','new_p_aaer','act','ap','at','ceq','che','cogs','csho','dlc','dltis','dltt','dp','ib','invt','ivao','ivst','lct','lt','ni','ppegt','pstk','re','rect','sale','sstk','txp','txt','xint','prcc_f','dch_wc','ch_rsst','dch_rec','dch_inv','soft_assets','ch_cs','ch_cm','ch_roa','issue','bm','dpi','reoa','EBIT','ch_fcf']]
y = df[['target']]

from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state=42)
X_res,y_res = sm.fit_resample(X,y)
print('Resampled dataset shape {}'.format(Counter(y_res)))

还有这个

# define sampling strategy
sample = SMOTEENN(sampling_strategy=0.5)
# fit and apply the transform
X_over,y_over = sample.fit_resample(X,y)
# summarize class distribution
print(Counter(y_over)) 

但在这两种情况下,结果都是一样的:

ValueError: could not convert string to float: '2.461.242' 

请问,有人可以帮我吗?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...