为NER测量F1分数

问题描述

我正在尝试评估NER（命名实体识别）的人工智能模型。
为了与其他基准进行比较，我需要计算模型的F1得分。但是，我不确定如何编写代码。

我的想法是：
正值：相等的令牌和相等的标记，标记为正值
假阴性：相等的标记和不相等的标记或标记未出现在预测中，标记的假阴性
假阳性：令牌不存在，但已分配给标签，例如：

短语：“这是一个测试”
预计：{token：这是标签：WHO}
真实对：{token：这个，标记：WHO}} {token：一个测试，标记：什么} 在这种情况下，{token：这是标签：WHO}被认为是WHO的假阳性。

代码：

       for val predicted tokens (pseudo-code) {   
       // val = struct { tokens,tags } from a phrase
           for (auto const &j : val.tags) {
                if (j.first == current_tokens) {
                    if (j.second == tag) {
                        true_positives[tag_id]++;
                    } else {
                        false_negatives[tag_id]++;
                    }
                    current_token_exists = true;
                }
                
            }
            if (!current_token_exists) {
                false_positives[tag_id]++;
            }
        }

        for (auto const &i : val.tags) {
            bool find = 0;
            for (auto const &j : listed_tokens) {
                if (i.first == j) {find = 1; break;}
            }
            if (!find) {
                false_negatives[str2tag_id[i.second]]++;
            }
        }

此后，计算F-1：

    float precision_total,recall_total,f_1_total;
    precision_total = total_true_positives / (total_true_positives + total_false_positives);
    recall_total = total_true_positives / (total_true_positives + total_false_negatives);
    f_1_total = (2 * precision_total * recall_total) / (precision_total + recall_total);

但是，我认为我在某些概念上是错误的。有人有意见吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

artificial-intelligence measurement ner nlp