问题描述
假设我们有两个字符串,我们需要在这两个字符串之间找到常用词。
str1 = "hit hop hat"
str2 = "hot has hit hop"
output = ["hit","hop"]
我知道我们可以使用仅拆分字符串并将单词集作为集合并采用交集。我的问题是我们如何优化空间?如果许多字符串都有共同的前缀怎么办?
解决方法
这是解决此问题的一种方法,即从较小的单词列表中创建简化的特里,然后在较长的列表中针对每个单词搜索匹配项:
def create_simplified_trie(words):
trie = {}
for word in words:
curr = trie
for c in word:
if c not in curr:
curr[c] = {}
curr = curr[c]
# Mark the end of a word
curr['#'] = True
return trie
str1 = "hit hop hat"
str2 = "hot has hit hop"
words1 = str1.split()
words2 = str2.split()
# Ensure words1 is the smaller length list
if len(words1) > len(words2):
words1,words2 = words2,words1
words1_trie = create_simplified_trie(words1)
output = []
for word in words2:
curr = words1_trie
found_prefix = True
for c in word:
if c not in curr:
found_prefix = False
break
curr = curr[c]
if found_prefix and '#' in curr:
output.append(word)
print(output)
输出:
['hit','hop']
,
一种简单的方法将使用单词集及其交集,如下所示:
limitOrSeekBefore: any