问题描述
我想生成具有不平衡率 (IR) 和干扰率 (DR) 的数据。不平衡比(IR)用于衡量不平衡程度,其定义为多数类的样本数除以少数类的样本数。干扰比表示类之间的重叠程度。我有兴趣生成如图所示的数据。谢谢你的时间! Synthetic datasets with IR= 5 and DR= 70%
# vary the number of clusters for a 1:100 imbalanced dataset
from collections import Counter
from sklearn.datasets import make_classification
from matplotlib import pyplot
from numpy import where
# number of clusters
clusters = [1,2]
# create and plot a dataset with different numbers of clusters
for i in range(len(clusters)):
c = clusters[i]
# define dataset
X,y = make_classification(n_samples=10000,n_features=2,n_redundant=0,n_clusters_per_class=c,weights=[0.99],flip_y=0,random_state=1)
counter = Counter(y)
# define subplot
pyplot.subplot(1,2,1+i)
pyplot.title('Clusters=%d' % c)
pyplot.xticks([])
pyplot.yticks([])
# scatter plot of examples by class label
for label,_ in counter.items():
row_ix = where(y == label)[0]
pyplot.scatter(X[row_ix,0],X[row_ix,1],label=str(label))
pyplot.legend()
# show the figure
pyplot.show()
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)