根据重叠的项目将列表的Python列表分组

问题描述

您是根据集合进行分组,因此请使用集合来检测新的分组:

def grouper(sequence):
    group, members = [], set()

    for item in sequence:
        if group and members.isdisjoint(item):
            # new group, yield and start new
            yield group
            group, members = [], set()
        group.append(item)
        members.update(item)

    yield group

这给出:

>>> for group in grouper(paths):
...     print group
... 
[['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C']]
[['E', 'G', 'I'], ['F', 'G', 'I']]

或者您可以将其再次投射到列表中:

output = list(grouper(paths))

这假定组是连续的。如果您有不相交的组,则需要处理整个列表并遍历到目前为止为每个项目构造的所有组:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]

解决方法

我有一个列表列表,我试图根据它们的项目对它们进行分组或聚类。如果上一个组中没有元素,则嵌套列表将开始一个新组。

输入:

paths = [  
        ['D','B','A','H'],['D','C'],['H',['E','G','I'],['F','I']]

我失败的代码:

paths = [
    ['D','I']
]
groups = []
paths_clone = paths
for path in paths:
    for node in path:
        for path_clone in paths_clone:
            if node in path_clone:
                if not path == path_clone:
                    groups.append([path,path_clone])
                else:
                    groups.append(path)
print groups

预期产量:

[
 [
  ['D','C']
 ],[
  ['E','I']
 ]
]

另一个例子:

paths = [['shifter','barrel','barrel shifter'],['ARM',['IP power','IP','power'],'shifter']]

预期的输出组:

output = [
         [['shifter','shifter']],[['IP power','power']],]