问题描述
我有大约 200 个候选句子,对于每个候选句子,我想通过将每个句子与数千个参考句子进行比较来衡量 bleu 分数。这些参考文献对所有考生都是相同的。这是我现在的做法:
ref_for_all = [reference] *len(sents)
score = corpus_bleu(ref_for_all,[i.split() for i in sents],weights=(0,1,0))
reference
包含我想与每个句子进行比较的整个语料库,sent
是我的句子(候选)。不幸的是,这需要太长时间,而且考虑到我的代码的实验性质,我不能等那么久才能得到结果。有没有其他方法(例如使用正则表达式)可以更快地获得这些分数?我对 Rouge 也有这个问题,因此也非常感谢您提出任何建议!
解决方法
在搜索和试验不同的软件包并测量每个软件包计算分数所需的时间后,我发现 nltk corpus bleu 和 PyRouge 是最有效的。请记住,在每条记录中,我有多个假设,这就是为什么我为每条记录计算一次均值,然后 这就是我为 BLEU 所做的:
reference = [[i.split() for i in ref]]
def find_my_bleu(text,w):
candidates_ = [text.split()]
return corpus_bleu(reference,candidates_,weights=w,smoothing_function=cc.method4)
def get_final_bleu(output_df):
print('Started calculating the bleu scores...')
output_df.loc[:,'bleu_1'] = output_df.loc[:,'final_predicted_verses'].apply(lambda x:[find_my_bleu(t,(1,0)) for t in x])
output_df.loc[:,'bleu_2'] = output_df.loc[:,(0,1,'bleu_3'] = output_df.loc[:,0)) for t in x])
print('Now the average score...')
output_df.loc[:,'bleu_3_mean'] = output_df.loc[:,'bleu_3'].apply(lambda x:np.mean(x))
output_df.loc[:,'bleu_2_mean'] = output_df.loc[:,'bleu_2'].apply(lambda x:np.mean(x))
output_df.loc[:,'bleu_1_mean'] = output_df.loc[:,'bleu_1'].apply(lambda x:np.mean(x))
print('mean bleu_3 score: ',np.mean(output_df.loc[:,'bleu_3_mean']))
print('mean bleu_2 score: ','bleu_2_mean']))
print('mean bleu_1 score: ','bleu_1_mean']))
对于胭脂:
rouge = PyRouge(rouge_n=(1,2),rouge_l=True,rouge_w=False,rouge_s=False,rouge_su=False)
def find_my_rouge(text):
hypotheses = [[text.split()]]
score = rouge.evaluate_tokenized(hypotheses,[[reference_rouge]])
return score
然后取所有的平均值:
def get_short_rouge(list_dicts):
""" get the mean of all generated text for each record"""
l_r = 0
l_p = 0
l_f = 0
one_r = 0
one_p = 0
one_f = 0
two_r = 0
two_p = 0
two_f = 0
for d in list_dicts:
one_r += d['rouge-1']['r']
one_p += d['rouge-1']['p']
one_f += d['rouge-1']['f']
two_r += d['rouge-2']['r']
two_p += d['rouge-2']['p']
two_f += d['rouge-2']['f']
l_r += d['rouge-l']['r']
l_p += d['rouge-l']['p']
l_f += d['rouge-l']['f']
length = len(list_dicts)
return {'rouge-1': {'r': one_r/length,'p': one_p/length,'f': one_f/length},'rouge-2': {'r': two_r/length,'p': two_p/length,'f': two_f/length},'rouge-l': {'r': l_r/length,'p': l_p/length,'f': l_f/length}
}
def get_overal_rouge_mean(output_df):
print('Started getting the overall rouge of each record...')
output_df.loc[:,'rouge_mean'] = output_df.loc[:,'rouge'].apply(lambda x: get_short_rouge(x))
print('Started getting the overall rouge of all record...')
l_r = 0
l_p = 0
l_f = 0
one_r = 0
one_p = 0
one_f = 0
two_r = 0
two_p = 0
two_f = 0
for i in range(len(output_df)):
d = output_df.loc[i,'rouge_mean']
one_r += d['rouge-1']['r']
one_p += d['rouge-1']['p']
one_f += d['rouge-1']['f']
two_r += d['rouge-2']['r']
two_p += d['rouge-2']['p']
two_f += d['rouge-2']['f']
l_r += d['rouge-l']['r']
l_p += d['rouge-l']['p']
l_f += d['rouge-l']['f']
length = len(output_df)
print('overall rouge scores: ')
print({'rouge-1': {'r': one_r/length,'f': l_f/length}
})
return output_df
我希望它可以帮助遇到此问题的任何人。