从列表列表中选择三个列表的最有效方法,以便所有列表中唯一元素的数量组合大于阈值

问题描述

找出是否可以从列表列表中选择三个列表以使所有列表中唯一元素的数量组合大于某个特定数量的最有效方法是什么?我能想出的最佳解决方案是蛮力:

def maximum_number_of_unique_elements(mylist,threshold):
    for a in range(len(mylist)):
        for b in range(a,len(mylist)):
            for c in range(b,len(mylist)):
                if len(set(mylist[a] + mylist[b] + mylist[c])) >= threshold:
                    return True
   return False

示例:

l = [[0,1,2],[0,1],3,6],[3,7],[4,7]]
t = 7

maximum_number_of_unique_elements(l,t) 返回 True,因为通过选择列表 0,2,4 创建了一个带有数字 0,4,6,7 的集合。

解决方法

更简单的解决方案是使用 itertools.combinations_with_replacement,它允许在不更改代码的情况下设置要使用的列表数量(不是每个列表的 for 循环)

from itertools import combinations_with_replacement,chain

def maximum_number_of_unique_elements(values,threshold,nb_list=3):
    for parts in combinations_with_replacement(values,r=nb_list):
        if len(set(chain.from_iterable(parts))) >= threshold:
            return True
    return False

使用 combinations_with_replacement 每个元素都可以重复,使用 combinations 获取唯一元素

print(list(combinations('ABC',r=2)))
# [('A','B'),('A','C'),('B','C')]

print(list(combinations_with_replacement('ABC','A'),('C','C')]
,

一些优化:

def maximum_number_of_unique_elements(mylist,threshold):
    sets = list(map(set,mylist))
    n = len(sets)
    for i,one in enumerate(sets):
        for j in range(i,n):
            two = one | sets[j]
            thresh = threshold - len(two)
            for third in sets[j:]:
                if len(third - two) >= thresh:
                    return True
    return False

基准测试结果:

 7.68 us  original
 8.06 us  azro
 4.49 us  Manuel

 7.77 us  original
 8.01 us  azro
 4.46 us  Manuel

 7.78 us  original
 7.85 us  azro
 4.48 us  Manuel

基准代码:

from timeit import repeat
from functools import partial
from itertools import combinations_with_replacement,chain

def original(mylist,threshold):
    for a in range(len(mylist)):
        for b in range(a,len(mylist)):
            for c in range(b,len(mylist)):
                if len(set(mylist[a] + mylist[b] + mylist[c])) >= threshold:
                    return True
    return False

def azro(values,r=nb_list):
        if len(set(chain.from_iterable(parts))) >= threshold:
            return True
    return False

def Manuel(mylist,n):
            two = one | sets[j]
            thresh = threshold - len(two)
            for third in sets[j:]:
                if len(third - two) >= thresh:
                    return True
    return False

def benchmark(*args):
    solutions = original,azro,Manuel
    number = 10 ** 5
    for _ in range(3):
        for solution in solutions:
            t = min(repeat(partial(solution,*args),number=number)) / number
            print('%5.2f us ' % (t * 1e6),solution.__name__)
        print()
benchmark([[0,1,2],[0,1],3,6],[3,7],[4,7]],7)