计算 sklearn.metrics.ndcg_score 时出错

问题描述

我正在尝试计算分类器的 ndcg 分数,但出现此错误

ValueError: 仅支持 ('multilabel-indicator','continuous-multIoUtput','multiclass-multIoUtput') 格式。取而代之的是多类

这是我的代码

# Declare classifier,fit on data and make predictions
from sklearn.ensemble import RandomForestClassifier
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train_tr,y_train)
y_pred_prob = rnd_forest.predict_proba(X_train_tr)

# Calculate ndcg score
from sklearn.metrics import ndcg_score
# This is where I get an error
ndcg_score(y_train,y_pred_prob,k=5)

这是我的目标和预测概率的样子:

# True labels of the first two samples
y_train[:2]
> array([7,7])
    
# Predicted probabilities for first two observation
y_pred_prob[:2]
> array([[0.,0.,1.,0.],[0.,0.]])

我尝试将 y_train 重塑为二维数组,但它不起作用。谁能告诉我如何解决这个错误

解决方法

假设您在 N 中有 y_train 次观察。您必须将 y_train 转换为 N 行和 12 列的矩阵。

# Create an ndarray of size (N,12) filled with zeros
y_train_matrix = np.zeros(shape=(y_pred_prob.shape[0],y_pred_prob.shape[1]))
# Write a 1 on each row's corresponding category
y_train_matrix[np.arange(y_pred_prob.shape[0]),y_train] = 1
# You now have this ndarray
y_train_matrix

array([[0.,0.,1.,0.],[0.,0.]])

现在可以计算分数了:

ndcg_score(y_train_matrix,y_pred_prob)

1.0