是否可以在一个迭代中找到一个字符串中指定的子字符串，或者比On * m更快？

问题描述

我有一个字符串和唯一子字符串列表。问题是要确定哪些子字符串出现在我们的字符串中。

只需两个嵌套循环即可完成

result = []
substrings = ['foo','bar','spam','eggs']
text = 'foo123123spameggsabcde'

for s in substrings:
    if s in text:
        result.append(s)

但是它很慢，特别是长字符串和许多子字符串。有没有一种方法可以更有效地执行此操作？

解决方法

使用SomeDude's algorithm中的this similar question，以下应该会非常有效地工作：

lens=set([len(i) for i in substrings])
d={}
for k in lens:
    d[k]=[text[i:i+k] for i in range(len(text)-k)]
s=set(sum(d.values(),[]))
result=list(s.intersection(set(substrings)))

print(result)

['foo','spam','eggs']

说明：我们将所有可能的单词长度保存在子字符串中。对于这些长度，我们在文本（集合s）中创建了所有可能的子字符串。最后，我们在s和子字符串中找到了常见项目，这就是问题的答案。

algorithm algorithm optimization optimization performance performance performance python time-complexity

是否可以在一个迭代中找到一个字符串中指定的子字符串，或者比On * m更快？

问题描述

解决方法

相关问答