如何从元组列表中识别可能的链模式

问题描述

已编辑

寻找简单或优化的方式来实现以下问题,似乎使用“networkx”我们可以很容易地实现这一点(感谢评论部分中的 BENY)

input_list  = [('A','B'),('D','C'),('C',('E','D'),('I','J'),('L','K'),('J','K')] # path map

def get_chain_list(sp,d):
    global result
    result.append(sp)
    if sp in d : get_chain_list(d[sp],d)
    return tuple(result)

d = dict(input_list)
s1 = set(d.keys())
s2 = set(d.values())
sps = s1 - s2

master_chain = []
for sp in sps :
    result = []
    master_chain.append(get_chain_list(sp,d))

output_list = sorted(master_chain,key=len,reverse=True)

print(output_list)
[('E','D','C','J',('A','K')] # Chains in input list

解决方法

这更像是一个networkx问题

import networkx as nx 
G = nx.Graph()
G.add_edges_from(input_list)
l = [*nx.connected_components(G)]
Out[6]: [{'A','B','C','D','E'},{'I','J','K','L'}]
,

使用

output_list = set(input_list)

然后使用元组形成所需的链模式,如下所示:

from string import ascii_uppercase

input_list  = [('A','B'),('D','C'),('C',('E','D'),('I','J'),('L','K'),('J','K')]

src=sorted({e for t in input_list for e in t})
ss=""
tgt=[]

for c in src:
    if ss+c in ascii_uppercase:
        ss+=c
    else:
        tgt.append(tuple(ss))
        ss=c
else:
    tgt.append(tuple(ss))

>>> tgt
[('A','E'),'L')]