DFS查找所有可能的路径非常慢 EDIT1

问题描述

我写了类似DFS的算法来查找从零级开始的所有可能路径。 由于有2,000个节点和5,000个边缘,因此下面的代码执行非常慢。 对这个算法有什么建议吗?

    all_path = []

    def printAllPathsUntil(s,path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i,path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point,path)

adj占据边;起始位置为键,目标列表为值。

    points_in_start = [0,3,7]
    adj = {0: [1,8],1: [2,5],2: [],3: [2,4],4: [],5: [6],6: [],7: [6],8: [2]
           }

EDIT1

  • 这是DAG。没有周期。

enter image description here

解决方法

您的算法存在的问题是它将重复很多工作。在您的示例中,情况并非如此,因为只有一个节点被另外两个节点到达时,它是一个叶节点,例如C,但是对从D到{{1 }}:这意味着将再次访问从B开始的整个子图!对于具有2000个节点的图,这将导致速度显着下降。

要解决此问题,您可以使用记忆,但是这意味着您必须重新构造算法,而不是添加到现有的B并将path添加到path,它必须all_paths从当前节点开始的(部分)路径,并将这些路径与父节点合并为完整路径。然后,当您再次访问return来自另一个节点时,可以使用functools.lru_cache重用所有这些部分结果。

B
,

正如评论和其他答案中已经指出的那样,记住先前访问的节点的下游路径是一个优化领域。

这是我要实现的尝试。

这里,downstream_paths是一本字典,我们在其中记住每个访问的非叶节点的下游路径。

我已经在最后提到了一个包含一个小的“重新访问的非叶子”案例的小测试案例的%%timeit结果。由于我的测试用例只有一个重新访问非叶子节点的情况,因此改进仅是适度的。也许在您的大规模数据集中,性能会有更大的差距。

输入数据:

points_in_start = [0,3,7]
adj = {0: [1,8],1: [2,5],2: [],3: [2,4],4: [],5: [6],6: [],7: [6],8: [2],# Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],10:[11,18],11:[12,15],12:[],15:[16],16:[],18:[12]
      }

修改后的代码:

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s,[])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s,[])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this,to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

输出:

from pprint import pprint
pprint (all_path)

[[0,1,2,10,11,12],[0,15,16],18,5,6],8,[3,[7,6]]

计时结果:原始发布的代码:

10000次循环,最佳3:每个循环63 µs

计时结果:优化代码:

10000次循环,最佳3:每个循环43.2 µs