为什么 Sacrebleu 为短句返回零 BLEU 分数？

问题描述

为什么 scarebleu 需要句子以点结尾？如果我删除点，则该值为零。

import sacrebleu,nltk
sys = ["This is cat."] 
refs = [["This is a cat."],["This is a bad cat."]] 

b3 = sacrebleu.corpus_bleu(sys,refs)
print("b3",b3.score)
print("b3",round(b3.score,2))

这将返回以下内容：

b3 35.1862973998119
b3 35.19

当我删除结束点时。

sys = ["This is cat"] 
refs = [["This is a cat"],["This is a bad cat"]] 


b3 = sacrebleu.corpus_bleu(sys,2))

它使用 scarebleu 打印零，这又很奇怪！：

b3 0.0
b3 0.0

解决方法

BLEU 被定义为（修改的）n-gram 精度的几何平均值，用于高达 4-gram 的 unigram（乘以简洁惩罚）。因此，如果整个测试集中没有匹配的 4-gram（没有 4-tuple 单词），则 BLEU 根据定义为 0。在末尾有一个将被标记化的点，使得现在有 4-gram 的匹配，因为应用了平滑。

BLEU 旨在对包含数百个句子的测试集进行评分，这种情况不太可能发生。对于单个句子的评分，您可以使用使用某种平滑的句子级版本的 BLEU，但结果仍然不理想。您还可以使用基于字符的度量，例如chrF (sacrebleu -m chrf)。

您还可以将 use_effective_order=True 传递给 corpus_bleu，以便只计算匹配的 n-gram 订单而不是 4 个 n-gram。但是，在这种情况下，该指标并不完全是人们所指的 BLEU。

bleu nltk