如何在cross_validate sklearn函数中集成G-mean？

问题描述

from sklearn.model_selection import cross_validate
scores = cross_validate(LogisticRegression(class_weight='balanced',max_iter=100000),X,y,cv=5,scoring=('roc_auc','average_precision','f1','recall','balanced_accuracy'))
scores['test_roc_auc'].mean(),scores['test_average_precision'].mean(),scores['test_f1'].mean(),scores['test_recall'].mean(),scores['test_balanced_accuracy'].mean()

如何在上述交叉验证评分参数下计算以下 G 均值：

from imblearn.metrics import geometric_mean_score
print('The geometric mean is {}'.format(geometric_mean_score(y_test,y_test_pred)))

或

from sklearn.metrics import accuracy_score
g_mean = 1.0
    #
for label in np.unique(y_test):
    idx = (y_test == label)
    g_mean *= accuracy_score(y_test[idx],y_test_pred[idx])
    #
g_mean = np.sqrt(g_mean)
score = g_mean
print(score)

解决方法

只需将其作为自定义得分手传递

from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score

gm_scorer = make_scorer(geometric_mean_score,greater_is_better=True,average='binary')

将 greater_is_better=True 设置为最接近 1 的最佳值。geometrics_mean_score 的其他参数可以直接传递给 make_scorer

完整示例

from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from imblearn.metrics import geometric_mean_score

X,y = load_breast_cancer(return_X_y=True)

gm_scorer = make_scorer(geometric_mean_score,greater_is_better=True)

scores = cross_validate(
    LogisticRegression(class_weight='balanced',max_iter=100000),X,y,cv=5,scoring=gm_scorer
)
scores
>>>
{'fit_time': array([0.76488066,0.69808364,1.22158527,0.94157672,1.01577377]),'score_time': array([0.00103951,0.00100923,0.00065804,0.00071168,0.00068736]),'test_score': array([0.91499142,0.93884403,0.9860133,0.92439026,0.9525989 ])}

编辑

要指定多个指标，请将字典传递给 scoring 参数

scores = cross_validate(
    LogisticRegression(class_weight='balanced',scoring={'gm_scorer': gm_scorer,'AUC': 'roc_auc','Avg_Precision': 'average_precision'}
)
scores
>>>
{'fit_time': array([1.03509665,0.96399784,1.49760461,1.13874388,1.32006526]),'score_time': array([0.00560617,0.00357151,0.0057447,0.00566769,0.00549698]),'test_gm_scorer': array([0.91499142,0.9525989 ]),'test_AUC': array([0.99443171,0.99344907,0.99801587,0.97949735,0.99765258]),'test_Avg_Precision': array([0.99670544,0.99623085,0.99893162,0.98640759,0.99861043])}

您需要制作自定义得分手，这是一个示例：https://stackoverflow.com/a/53850851/12384070 然后，如果这是您想要的唯一得分手，您可以这样做：

scores = cross_validate(LogisticRegression(class_weight='balanced',scoring=your_custom_function)

我认为您可以使用其他得分手，如文档中所述：

If scoring reprents multiple scores,one can use:

a list or tuple of unique strings;

a callable returning a dictionary where the keys are the metric names and the values are the metric scores;

a dictionary with metric names as keys and callables a values.

classification machine-learning python scikit-learn