使用字典数据结构在有向图中找到至少有 3 个节点的所有循环

问题描述

Directed Graph

上图是用 LaTeX 绘制的:https://www.overleaf.com/read/rxhpghzbkhby

上图在 Python 中表示为字典。

graph = {
    'A' : ['B','D','C'],'B' : ['C'],'C' : [],'D' : ['E'],'E' : ['G'],'F' : ['A','I'],'G' : ['A','K'],'H' : ['F','G'],'I' : ['H'],'J' : ['A'],'K' : []
}

我有一个包含大约 3,378,546 个节点的大图。

给定上面的有向图,我试图找到具有至少 3 个且少于 5 个不同节点的圆,并输出前 3 个圆。

我在这个问题上花了 1 天半的时间。我查看了 Stackoverflow,甚至尝试按照此 Detect Cycle in a Directed Graph 教程进行操作,但找不到解决方案。

在这个例子中,输出是一个制表符分隔的文本文件,其中每一行都有一个循环。

0 A,D,E,G
1 F,I,H

01 是索引。 此外,图节点的字母表中没有顺序。

我试过这个表格 How to implement depth-first search in Python 教程:

visited = set()

def dfs(visited,graph,node):
    if node not in visited:
        print (node)
        visited.add(node)
        for neighbour in graph[node]:
            dfs(visited,neighbour)

dfs(visited,'A')

但这无济于事。我也试过这个Post

解决方法

注意:此解决方案是描述解决方案的扩展解决方案。我扩展到具有约 300 万个节点的原始图,并查找至少包含 3 个节点且少于 40 个节点的所有循环,并将前 3 个循环存储到一个文件中。


我想出了以下解决方案。

# implementation of Johnson's cycle finding algorithm
# Original paper: Donald B Johnson. "Finding all the elementary circuits of a directed graph." SIAM Journal on Computing. 1975.

from collections import defaultdict

import networkx as nx
from networkx.utils import not_implemented_for,pairwise

@not_implemented_for("undirected")
def findCycles(G):
    """Find simple cycles of a directed graph.
    A `simple cycle` is a closed path where no node appears twice.
    Two elementary circuits are distinct if they are not cyclic permutations of each other.
    This is iterator/generator version of Johnson's algorithm [1]_.
    There may be better algorithms for some cases [2]_ [3]_.
    Parameters
    ----------
    G : NetworkX DiGraph
       A directed graph
    Returns
    -------
    cycle_generator: generator
       A generator that produces elementary cycles of the graph.
       Each cycle is represented by a list of nodes along the cycle.
    Examples
    --------
    >>> graph = {'A' : ['B','D','C'],'B' : ['C'],'C' : [],'D' : ['E'],'E' : ['G'],'F' : ['A','I'],'G' : ['A','K'],'H' : ['F','G'],'I' : ['H'],'J' : ['A'],'K' : []
                }
    >>> G = nx.DiGraph()
    >>> G.add_nodes_from(graph.keys())
    >>> for keys,values in graph.items():
            G.add_edges_from(([(keys,node) for node in values]))
    >>> list(nx.findCycles(G))
    [['F','I','H'],['G','A','E']]
    Notes
    -----
    The implementation follows pp. 79-80 in [1]_.
    The time complexity is $O((n+e)(c+1))$ for $n$ nodes,$e$ edges and $c$
    elementary circuits.
    References
    ----------
    .. [1] Finding all the elementary circuits of a directed graph.
       D. B. Johnson,SIAM Journal on Computing 4,no. 1,77-84,1975.
       https://doi.org/10.1137/0204007
    .. [2] Enumerating the cycles of a digraph: a new preprocessing strategy.
       G. Loizou and P. Thanish,Information Sciences,v. 27,163-182,1982.
    .. [3] A search strategy for the elementary cycles of a directed graph.
       J.L. Szwarcfiter and P.E. Lauer,BIT NUMERICAL MATHEMATICS,v. 16,no. 2,192-204,1976.
    --------
    """

    def _unblock(thisnode,blocked,B):
        stack = {thisnode}
        while stack:
            node = stack.pop()
            if node in blocked:
                blocked.remove(node)
                stack.update(B[node])
                B[node].clear()

    # Johnson's algorithm requires some ordering of the nodes.
    # We assign the arbitrary ordering given by the strongly connected comps
    # There is no need to track the ordering as each node removed as processed.
    # Also we save the actual graph so we can mutate it. We only take the
    # edges because we do not want to copy edge and node attributes here.
    subG = type(G)(G.edges())
    sccs = [scc for scc in nx.strongly_connected_components(subG) if len(scc) in list(range(3,41))]

    # Johnson's algorithm exclude self cycle edges like (v,v)
    # To be backward compatible,we record those cycles in advance
    # and then remove from subG
    for v in subG:
        if subG.has_edge(v,v):
            yield [v]
            subG.remove_edge(v,v)

    while sccs:
        scc = sccs.pop()
        sccG = subG.subgraph(scc)
        # order of scc determines ordering of nodes
        startnode = scc.pop()
        # Processing node runs "circuit" routine from recursive version
        path = [startnode]
        blocked = set()  # vertex: blocked from search?
        closed = set()  # nodes involved in a cycle
        blocked.add(startnode)
        B = defaultdict(set)  # graph portions that yield no elementary circuit
        stack = [(startnode,list(sccG[startnode]))]  # sccG gives comp nbrs
        while stack:
            thisnode,nbrs = stack[-1]
            if nbrs:
                nextnode = nbrs.pop()
                if nextnode == startnode:
                    yield path[:]
                    closed.update(path)
                #                        print "Found a cycle",path,closed
                elif nextnode not in blocked:
                    path.append(nextnode)
                    stack.append((nextnode,list(sccG[nextnode])))
                    closed.discard(nextnode)
                    blocked.add(nextnode)
                    continue
            # done with nextnode... look for more neighbors
            if not nbrs:  # no more nbrs
                if thisnode in closed:
                    _unblock(thisnode,B)
                else:
                    for nbr in sccG[thisnode]:
                        if thisnode not in B[nbr]:
                            B[nbr].add(thisnode)
                stack.pop()
                path.pop()
        # done processing this node
        H = subG.subgraph(scc)  # make smaller to avoid work in SCC routine
        sccs.extend(scc for scc in nx.strongly_connected_components(H) if len(scc) in list(range(3,41)))
import sys,csv,json

def findAllCycles(jsonInputFile,textOutFile):
    """Find simple cycles of a directed graph (jsonInputFile).
    Parameters:
    ----------
        jsonInputFile: a json file that has all concepts
        textOutFile: give a desired name of output file
    Returns:
    ----------
        a .text file (named: {textOutFile}.txt) has the first 3 cycles found in jsonInputFile
        Each cycle is represented by a list of nodes along the cycle
    """

    with open(jsonInputFile) as infile:
        graph = json.load(infile)
    # Convert the json file to a NetworkX directed graph
    G = nx.DiGraph()
    G.add_nodes_from(graph.keys())
    for keys,values in graph.items():
        G.add_edges_from(([(keys,node) for node in values]))
    # Search for all simple cycles existed in the graph
    _cycles = list(findCycles(G))
    # Start with an empty list and populate it by looping over all cycles
    # in _cycles that have at least 3 and less than 40 different concepts (nodes)
    cycles = []
    for cycle in _cycles:
        if len(cycle) in list(range(3,41)):
            cycles.append(cycle)
    # Store the cycels under constraint in {textOutFile}.txt
    with open(textOutFile,'w') as outfile:
        for cycle in cycles[:3]:
            outfile.write(','.join(n for n in cycle)+'\n')
        outfile.close()
    # When process finishes,print Done!!
    return 'Done!!'


infile = sys.argv[1]
outfile = sys.argv[2]
first_cycles = findAllCycles(infile,outfile)

要运行此程序,您只需使用如下命令行:

>> python3 {program file name}.py graph.json {desired output file name}[.txt][.csv]

让,例如{所需的输出文件名}}.[txt][.csv],为first_3_cycles_found.txt

就我而言,该图有 3,378,546 个节点,使用上述代码我花了大约 40 分钟才找到所有循环。因此,输出文件将是:

enter image description here

如果您认为需要改进或添加其他内容,请为此做出贡献。

,

这是一个注释代码,它将打印包含找到的循环的数组。我相信将返回值调整为所需的格式(我认为在您的情况下为 CSV)不需要更多。

这可能是因为使用 3M 节点,结果证明速度很慢。然后,我建议采用动态编程的方式并缓存/记住一些递归的结果,以免重复它们。

我希望这能解决您的问题或至少有所帮助。

def cycles_rec(root,current_node,graph,depth,visited,min_depth,max_depth):
    depth += 1

    # First part our stop conditions
    if current_node in visited or current_node not in graph.keys():
        return ''

    if depth >= max_depth:
        return ''

    visited.append(current_node)

    if root in graph[current_node] and depth >= min_depth:
        return current_node

    # The recursive part
    # for each connection we try to find recursively one that would cycle back to our root
    for connections in graph[current_node]:
        for connection in connections:
            result = cycles_rec(root,connection,max_depth)
            # If a match was found,it would "bubble up" here,we can return it along with the
            # current connection that "found it"
            if result != '':
                return current_node + ' ' + result

    # If we are here we found no cycle        
    return ''

def cycles(graph,min_depth = 3,max_depth = 5):
    cycles = {}
    for node,connections in graph.items():
        for connection in connections:
            visited = []
            # Let the recursion begin here
            result = cycles_rec(node,1,max_depth)
            if result == '':
                continue 
            # Here we found a cycle.
            # Fingerprint is only necessary in order to not repeat the cycles found in the results
            # It could be ignored if repeating them is not important
            # It's based on the fact that nodes are all represented as letters here
            # It could be it's own function returning a hash for example if nodes have a more
            # complex representation
            fingerprint = ''.join(sorted(list(node + ' ' + result)))
            if fingerprint not in cycles.keys():
                cycles[fingerprint] = node + ' ' + result

    return list(cycles.values())

因此,假设您在示例中声明了图形变量:

print(cycles(graph,3,5))

会打印出来

['A D E G','F I H']

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...