问题描述
我正在努力实现以下目标:
给出一个字符串:
string1 =“你好,我的名字是世界。我爱Java”
string2 =“嘿,我的主题是世界。我爱Java”
输出:
{“之前”:[你好,名字],“之后”:[你好,动漫]}
当前,我试图通过使用difflib库的Differ函数来实现此目的,该函数提供了基于字符的差异,但是我不确定如何捕获已更改的单词而不是仅捕获字符。当前实现如下:
def change_comparison_report_generator(string1,string2):
diff_list = []
d = difflib.Differ()
for i in range(len(input_before_pages)):
diff = d.compare(input_before_pages[i].splitlines(),input_after_pages[i].splitlines())
output = '\n'.join(diff)
diff_list.append(output)
return diff_list
输入与以前相同,得到的输出如下:
- Hello- my name is World. I love Java
? ^^^^^ ^^^^
+ Helo- my anme is World. I love Java
? ^^^^ ^^^^
问题是:如何如上所述捕获单词并以字典形式输出?任何帮助或提示,不胜感激,谢谢!
解决方法
我会尝试一些简单的方法:
# This will split words at spaces and keep other characters.
words1 = string1.split()
words2 = string2.split()
# Alternative option using a regular expression
import re
words1 = re.findall(r"\b\w+\b",string1)
words2 = re.findall(r"\b\w+\b",string2)
然后,zip
列出并比较每个元组:
words = {"before": [],"after": []}
for w1,w2 in zip(words1,words2):
if w1 != w2:
words["before"].append(w1)
words["after"].append(w2)
输出:
{'before': ['Hello','name'],'after': ['Helo','anme']}