我们如何找到两个字符串之间的共同词？

问题描述

假设我们有两个字符串，我们需要在这两个字符串之间找到常用词。

str1 = "hit hop hat"
str2 = "hot has hit hop"

output = ["hit","hop"]

我知道我们可以使用仅拆分字符串并将单词集作为集合并采用交集。我的问题是我们如何优化空间？如果许多字符串都有共同的前缀怎么办？

解决方法

这是解决此问题的一种方法，即从较小的单词列表中创建简化的特里，然后在较长的列表中针对每个单词搜索匹配项：

def create_simplified_trie(words):
    trie = {}
    for word in words:
        curr = trie
        for c in word:
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
        # Mark the end of a word
        curr['#'] = True  
    return trie

str1 = "hit hop hat"
str2 = "hot has hit hop"
words1 = str1.split()
words2 = str2.split()
# Ensure words1 is the smaller length list
if len(words1) > len(words2):
    words1,words2 = words2,words1

words1_trie = create_simplified_trie(words1)

output = []
for word in words2:
    curr = words1_trie
    found_prefix = True
    for c in word:
        if c not in curr:
            found_prefix = False
            break
        curr = curr[c]
    if found_prefix and '#' in curr:
        output.append(word)

print(output)

输出：

['hit','hop']

一种简单的方法将使用单词集及其交集，如下所示：

limitOrSeekBefore: any

algorithm algorithm data-structures python string string trie