高效计算多个集合的异或/对称差集合列表

问题描述

我有任意数量的 Python 集，例如

>>> a = {1,2,3}
>>> b = {3,4,5}
>>> c = {5,6,7}
>>> d = {7,8,1}

我想计算它们的“组合”对称差异，即我想对它们全部进行异或：

>>> a ^ b ^ c ^ d
{2,8}

在我的用例中，我实际上是在处理集合列表：

>>> l = [a,b,c,d]
>>> l
[{1,3},{3,5},{5,7},{1,7,8}]

目前，我正在遍历列表以实现我想要的：

>>> res = l[0].copy()
>>> for item in l[1:]:
...     res.symmetric_difference_update(item)
>>> res
{2,8}

我想知道是否有更有效的方法，最好不经过 Python for 循环。 Python 中的设置操作实际上非常快，但我的列表可能会变得很长，因此具有讽刺意味的是 for 循环本身成为一个瓶颈。

编辑 (1)

我假设列表中所有集合的每个可能条目在列表中的所有集合中出现的次数不超过两次。

编辑 (2)

一些基准：

from typing import List,Set
from functools import reduce
from collections import defaultdict

length = 1_000
data = [
    {idx - 1,idx,idx + 1}
    for idx in range(3_000,3_000 + length * 2,2)
]

def test_loop1(l: List[Set[int]]) -> Set[int]:
    res = l[0].copy()
    for item in l[1:]:
        res.symmetric_difference_update(item)
    assert len(res) == len(l) + 2
    return res

test_loop1：121 微秒 ± 321 纳秒

def test_loop2(l: List[Set[int]]) -> Set[int]:
    res = set()
    for item in l:
        res.symmetric_difference_update(item)
    assert len(res) == len(l) + 2
    return res

test_loop2：112 µs ± 3.16 µs

def test_reduce1(l: List[Set[int]]) -> Set[int]:
    res = reduce(Set.symmetric_difference,l)
    assert len(res) == len(l) + 2
    return res

test_reduce1：9.89 毫秒 ± 20.6 微秒

def test_dict1(l: List[Set[int]]) -> Set[int]:
    """
    A general solution allowing for entries to occur more than twice in the input data
    """
    d = defaultdict(int)
    for item in l:
        for entry in item:
            d[entry] += 1
    res = {entry for item in l for entry in item if d[entry] == 1}
    assert len(res) == len(l) + 2
    return res

test_dict1：695 µs ± 5.11 µs

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

python set set set set-operations symmetric-difference