从 pyspark 中的 Graphframes 图中找到一个诱导子图

问题描述

有没有办法从 pySpark 中的 GraphFrame 图中找到具有给定中心节点的诱导子图? ?我曾尝试从模体制作诱导子图,但没有成功。

我尝试使用 NetworkX 的 ego 图,它按预期工作,但对于大型图(1200 万条边),加载整个图需要很长时间。

这里是一个中心节点为'a'的例子

    v = sqlc.createDataFrame([
  ("a","Alice",34),("b","Bob",36),("c","Charlie",30),("d","David",29),("e","Esther",32),("f","Fanny",("g","Gabby",60)
],["id","name","age"])
# Edge DataFrame
e = sqlc.createDataFrame([
  ("a","b","friend"),"c","f","d","a",("a","e","friend")
],["src","dst","relationship"])
# Create a GraphFrame
g = GraphFrame(v,e)


get_community(g,1)


def create_motif(length: int) -> str:
        """Create a motif string.
        Args:
            length (int):
        """

        motif_path = "(start)-[edge0]->"
        for i in range(1,length):
            motif_path += "(n%s);(n%s)-[edge%s]->" % (i - 1,i - 1,i)
        motif_path += "(end)"
        return motif_path

def get_community(G,depth):
    
    motif_path = create_motif(depth)
    current_motif = G.find(motif_path)\
        
    current_motif.select(f.col("start.*"),"*").show()

返回:

+---+-----+---+--------------+--------------+---------------+                   
| id| name|age|         start|         edge0|            end|
+---+-----+---+--------------+--------------+---------------+
|  a|Alice| 34|[a,Alice,34]|[a,e,friend]|[e,Esther,32]|
|  a|Alice| 34|[a,b,friend]|   [b,Bob,36]|
+---+-----+---+--------------+--------------+---------------+

应该返回

+---+-----+---+--------------+--------------+---------------+                   
| id| name|age|         start|         edge0|            end|
+---+-----+---+--------------+--------------+---------------+
|  a|Alice| 34|[a,36]|
|  a|Alice| 34|[a,d,friend]| [d,David,29]|
|  b|  Bob| 36|[b,36]|[b,29]|
+---+-----+---+--------------+--------------+---------------+

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)