Python从列表中删除部分重复

问题描述

我有一个不正确创建的项目列表。它没有复制整个项目一次,而是制作了同一项目的多个部分副本。部分重复项与其他重复项和一些唯一项混合在一起。例如列出一个

a = ['one two','one two three four','one two three','five six','five six seven','eight nine']

我想删除部分重复项并保留该项目的最长表达。例如,我想生成列表b:

b = ['one two three four','eight nine']

商品的完整性必须保持完整,不能成为:

c ='[二一三四','妻子六七','八九']

解决方法

尝试一下:

def group_partials(strings):
    it = iter(sorted(strings))
    prev = next(it)
    for s in it:
        if not s.startswith(prev):
            yield prev
        prev = s
    yield s

a = ['one two','one two three','one two three four','five six','five six seven','eight nine']
b = list(group_partials(a))
,

您可以为此使用集。

尝试此代码

a = ['one two','eight nine']

# check for subsets
for i in range(len(a)):
   for j in range(len(a)):
      if i==j: continue # same index
      if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
         a[i]="" # clear string

# a = [x for x in a if len(x)]  # remove empty strings

b = []
for x in a:  # each string in a
   if len(x) > 0: # if not empty
      b.append(x)  # add to final list  

a = b

print(a)

输出

['one two three four','eight nine']