问题描述
我有一个文本摘要项目。在这个项目中,我确保按顺序汇总数百个文本。我还得到了这些摘要的 Rouge 分数。但是,我必须先将 Rouge 分数保留在列表中,然后才能生成统计数据。我无法弄清楚如何做到这一点。你能帮我吗?
from rouge_score import rouge_scorer
scorer = rouge_scorer.Rougescorer(['rouge1'])
scorer.score(hyp,ref)
scores.append(scorer.score(hyp,ref))
示例结果:
[{'rouge1': score(precision=0.46017699115044247,recall=0.45217391304347826,fmeasure=0.45614035087719296)},{'rouge1': score(precision=0.1693121693121693,recall=0.2831858407079646,fmeasure=0.21192052980132448)}]
当然,我无法直接访问结果。
解决方法
如果您想直接访问 Score 对象,您应该定义字典的键 ('rouge1'
)。
因此 scores.append(scorer.score(hyp,ref))
将更改为 scores.append(scorer.score(hyp,ref)['rouge1'])
。
以下代码是更通用的版本,用于计算每个文档的 ROUGE 指标并在单个字典中分别记住结果:
# importing the native rouge library
from rouge_score import rouge_scorer
# a list of the hypothesis documents
hyp = ['This is the first sample','This is another example']
# a list of the references documents
ref = ['This is the first sentence','It is one more sentence']
# make a RougeScorer object with rouge_types=['rouge1']
scorer = rouge_scorer.RougeScorer(['rouge1'])
# a dictionary that will contain the results
results = {'precision': [],'recall': [],'fmeasure': []}
# for each of the hypothesis and reference documents pair
for (h,r) in zip(hyp,ref):
# computing the ROUGE
score = scorer.score(h,r)
# separating the measurements
precision,recall,fmeasure = score['rouge1']
# add them to the proper list in the dictionary
results['precision'].append(precision)
results['recall'].append(recall)
results['fmeasure'].append(fmeasure)
输出将如下所示:
{'fmeasure': [0.8000000000000002,0.22222222222222224],'precision': [0.8,0.2],'recall': [0.8,0.25]}
此外,我将推荐 rouge library,它是 ROUGE paper 的另一个实现。结果可能略有不同,但它会引入一些有用的功能,包括通过传入整个文本文档并计算所有文档的平均结果来计算胭脂度量的可能性。