我们如何找到两个字符串之间的共同词?

问题描述

假设我们有两个字符串,我们需要在这两个字符串之间找到常用词。

str1 = "hit hop hat"
str2 = "hot has hit hop"

output = ["hit","hop"]

我知道我们可以使用仅拆分字符串并将单词集作为集合并采用交集。我的问题是我们如何优化空间?如果许多字符串都有共同的前缀怎么办?

解决方法

这是解决此问题的一种方法,即从较小的单词列表中创建简化的特里,然后在较长的列表中针对每个单词搜索匹配项:

def create_simplified_trie(words):
    trie = {}
    for word in words:
        curr = trie
        for c in word:
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
        # Mark the end of a word
        curr['#'] = True  
    return trie

str1 = "hit hop hat"
str2 = "hot has hit hop"
words1 = str1.split()
words2 = str2.split()
# Ensure words1 is the smaller length list
if len(words1) > len(words2):
    words1,words2 = words2,words1

words1_trie = create_simplified_trie(words1)

output = []
for word in words2:
    curr = words1_trie
    found_prefix = True
    for c in word:
        if c not in curr:
            found_prefix = False
            break
        curr = curr[c]
    if found_prefix and '#' in curr:
        output.append(word)

print(output)

输出:

['hit','hop']
,

一种简单的方法将使用单词集及其交集,如下所示:

limitOrSeekBefore: any