问题描述
我有一个不正确创建的项目列表。它没有复制整个项目一次,而是制作了同一项目的多个部分副本。部分重复项与其他重复项和一些唯一项混合在一起。例如列出一个:
a = ['one two','one two three four','one two three','five six','five six seven','eight nine']
我想删除部分重复项并保留该项目的最长表达。例如,我想生成列表b:
b = ['one two three four','eight nine']
商品的完整性必须保持完整,不能成为:
c ='[二一三四','妻子六七','八九']
解决方法
尝试一下:
def group_partials(strings):
it = iter(sorted(strings))
prev = next(it)
for s in it:
if not s.startswith(prev):
yield prev
prev = s
yield s
a = ['one two','one two three','one two three four','five six','five six seven','eight nine']
b = list(group_partials(a))
,
您可以为此使用集。
尝试此代码
a = ['one two','eight nine']
# check for subsets
for i in range(len(a)):
for j in range(len(a)):
if i==j: continue # same index
if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
a[i]="" # clear string
# a = [x for x in a if len(x)] # remove empty strings
b = []
for x in a: # each string in a
if len(x) > 0: # if not empty
b.append(x) # add to final list
a = b
print(a)
输出
['one two three four','eight nine']