如何从正的未标记学习中计算 roc auc 分数？

问题描述

我正在尝试调整一些代码，以便从 this example 中进行积极的未标记学习，它与我的数据一起运行，但我还想计算我遇到的 ROC AUC 分数。

我的数据分为正样本（data_P）和未标记样本（data_U），每个样本只有2个特征/数据列例如：

#3 example rows:
data_P

[[-1.471,5.766],[-1.672,5.121],[-1.371,4.619]]

#3 example rows:
data_U

[[1.23,6.26],[-5.72,4.1213],[-3.1,7.129]]

我在链接示例中运行了正未标记的学习：

known_labels_ratio = 0.5

NP = data_P.shape[0]
NU = data_U.shape[0]

T = 1000
K = NP
train_label = np.zeros(shape=(NP+K,))
train_label[:NP] = 1.0
n_oob = np.zeros(shape=(NU,))
f_oob = np.zeros(shape=(NU,2))
for i in range(T):
    # Bootstrap resample
    bootstrap_sample = np.random.choice(np.arange(NU),replace=True,size=K)
    # Positive set + bootstrapped unlabeled set
    data_bootstrap = np.concatenate((data_P,data_U[bootstrap_sample,:]),axis=0)
    # Train model
      model = DecisionTreeClassifier(max_depth=None,max_features=None,criterion='gini',class_weight='balanced')
    model.fit(data_bootstrap,train_label)
    # Index for the out of the bag (oob) samples
    idx_oob = sorted(set(range(NU)) - set(np.unique(bootstrap_sample)))
    # Transductive learning of oob samples
    f_oob[idx_oob] += model.predict_proba(data_U[idx_oob])
    n_oob[idx_oob] += 1
    
predict_proba = f_oob[:,1]/n_oob

这一切都运行良好，但我想要的是运行 roc_auc_score()，我在如何不出错的情况下陷入困境。

目前我正在尝试：

y_pred = model.predict_proba(data_bootstrap)
roc_auc_score(train_label,y_pred)
ValueError: bad input shape (3,2)

问题似乎是 y_pred 给出了 2 列的输出，如下所示：

y_pred
array([[0.00554287,0.9944571 ],[0.0732314,0.9267686 ],[0.16861796,0.83138204]])

我不知道为什么 y_pred 会这样结束，它是否根据样本是否分为 2 组给出了概率？正面还是其他本质？我可以过滤这些以选择每行得分最高的概率吗？或者有没有办法让我改变这种方式或另一种方式来计算 AUCROC 分数？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

auc machine-learning numpy python scikit-learn