海王星-如何以权重比例权衡所有节点gremlin

问题描述

在以下情况下，我很难在gremlin中确定查询。这是有向图（可能是循环的）。

我想从节点“ Jane”开始获得前N个有利节点，在这里，有利条件定义为：

favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa
favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)

favor(Jane->Jerryd) = favor(Jane->Thomas) * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)

favor(Jane->Jerryd) = [favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)] * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)


and so .. on

这是同一张图，用手工计算出我的意思，

这很容易通过编程进行传输，但是我不确定用gremlin甚至sparql查询它的精确程度。

以下是创建此示例图的查询：

g
.addV('person').as('1').property(single,'name','jane')
.addV('person').as('2').property(single,'thomas')
.addV('person').as('3').property(single,'lisa')
.addV('person').as('4').property(single,'wyd')
.addV('person').as('5').property(single,'jerryd')
.addE('favor').from('1').to('2').property('weight',10)
.addE('favor').from('1').to('3').property('weight',20)
.addE('favor').from('3').to('2').property('weight',90)
.addE('favor').from('2').to('4').property('weight',50)
.addE('favor').from('2').to('5').property('weight',90)
.addE('favor').from('3').to('5').property('weight',100)

我要寻找的是：

[Lisa,computedFavor]
[Thomas,computedFavor]
[Jerryd,computedFavor]
[Wyd,computedFavor]

我正在努力不配合循环图来调整权重。到目前为止，这是我可以查询的地方：https://gremlify.com/f2r0zy03oxc/2

g.V().has('name','jane').       // our starting node
   repeat(                      
      union(                    
         outE()                 // get only outwards edges
      ).
      otherV().simplePath()).   // produce simple path
   emit().  
   times(10).                   // max depth of 10
   path().                      // attain path
   by(valueMap())

解决斯蒂芬·马勒特的评论：

favor(Jane->Jerryd) = 
    favor(Jane->Thomas) * favor(Thomas->Jerryd) 
  + favor(Jane->Lisa) * favor(Lisa->Jerryd)

// note we can expand on favor(Jane->Thomas) in above expression
// 
// favor(Jane->Thomas) is favor(Jane->Thomas)@directEdge +
//                        favor(Jane->Lisa) * favor(Lisa->Thomas)
//

计算示例

Jane to Lisa                   => 20/(10+20)         => 2/3
Lisa to Jerryd                 => 100/(100+90)       => 10/19
Jane to Lisa to Jerryd         => 2/3*(10/19)

Jane to Thomas (directly)      => 10/(10+20)         => 1/3
Jane to Lisa to Thomas         => 2/3 * 90/(100+90)  => 2/3 * 9/19
Jane to Thomas                 => 1/3 + (2/3 * 9/19)

Thomas to Jerryd               => 90/(90+50)         => 9/14
Jane to Thomas to Jerryd       => [1/3 + (2/3 * 9/19)] * (9/14)

Jane to Jerryd:
= Jane to Lisa to Jerryd + Jane to Thomas to Jerryd
= 2/3 * (10/19) + [1/3 + (2/3 * 9/19)] * (9/14)

这里有点是psedocode：

def get_favors(graph,label="jane",starting_favor=1):
  start = graph.findNode(label)
  queue = [(start,starting_favor)]
  favors = {}
  seen = set()
  
  while queue:
    node,curr_favor = queue.popleft()

    # get total weight (out edges) from this node
    total_favor = 0
    for (edgeW,outNode) in node.out_edges:
       total_favor = total_favor + edgeW

    for (edgeW,outNode) in node.out_edges:
    
       # if there are no favors for this node
       # take current favor and provide proportional favor
       if outNode not in favors:
          favors[outNode] = curr_favor * (edgeW / total_favor)

       # it already has some favor,so we add to it
       # we add proportional favor
       else:
          favors[outNode] += curr_favor * (edgeW / total_favor)

       # if we have seen this edge,and node ignore
       # otherwise,transverse
    
       if (edgeW,outNode) not in seen:
          seen.add((edgeW,outNode))
          queue.append((outNode,favors[outNode]))

   # sort favor by value and return top X
   return favors

解决方法

这是一个Gremlin查询，我相信可以正确应用您的公式。我将首先粘贴完整的最终查询，然后对所涉及的步骤说几句话。

gremlin> g.withSack(1).V().
......1>    has('name','jane').
......2>    repeat(outE().
......3>           sack(mult).
......4>             by(project('w','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    until(has('name','jerryd')).
.....11>    sack().
.....12>    sum()     

==>0.768170426065163

查询从Jane开始，并一直遍历直到检查到Jerry D的所有路径。沿途为每个遍历者保持sack，其中包含每个关系的计算权重值相乘在一起。第6行的计算找到了所有可能来自先前顶点的边缘权重值，第7行的math步骤用于将当前边缘的权重除以该总和。最后，每个计算的结果都在第12行中相加。如果删除最后的sum步骤，则可以看到中间结果。

gremlin> g.withSack(1).V().
......1>    has('name','jerryd')).
.....11>    sack()

==>0.2142857142857143
==>0.3508771929824561
==>0.2030075187969925

要查看已执行path步骤的路线，可以将其添加到查询中。

gremlin> g.withSack(1).V().
......1>    has('name','jerryd')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),.....16>        sack()).fold()) 

==>[[jane,10,thomas,90,jerryd],0.2142857142857143]
==>[[jane,20,lisa,100,0.3508771929824561]
==>[[jane,0.2030075187969925]

这种方法还考虑了根据您的公式添加任何直接连接的方式，因为我们可以看到我们是否使用Thomas作为目标。

gremlin>  g.withSack(1).V().
......1>    has('name','thomas')).
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight'),.....16>        sack()).fold())    

==>[[jane,thomas],0.3333333333333333]
==>[[jane,0.3157894736842105]

这些多余的步骤不是必需的，但是在调试此类查询时，包含path很有用。另外，这不是必需的，但可能只是出于一般兴趣，我将补充一点，您也可以从此处获得最终答案，但是我所包含的第一个查询就是您真正需要的。

g.withSack(1).V().
   has('name','jane').
   repeat(outE().
          sack(mult).
            by(project('w','f').
              by('weight').
              by(outV().outE().values('weight').sum()).
              math('w / f')).
          inV().
          simplePath()).
   until(has('name','thomas')).
   local(
     union(
       path().
         by('name').
         by('weight'),sack()).fold().tail(local)).  
    sum() 
  
==>0.6491228070175439

如果其中任何一个不清楚或我误解了公式，请告诉我。

编辑添加

要找到Jane可以联系到的所有人的结果，我必须对查询进行一些修改。最后的unfold只是为了使结果更易于阅读。

gremlin> g.withSack(1).V().
......1>    has('name','f').
......5>               by('weight').
......6>               by(outV().outE().values('weight').sum()).
......7>               math('w / f')).
......8>           inV().
......9>           simplePath()).
.....10>    emit().
.....11>    local(
.....12>      union(
.....13>        path().
.....14>          by('name').
.....15>          by('weight').unfold(),.....16>        sack()).fold()).
.....17>        group().
.....18>          by(tail(local,2).limit(local,1)).
.....19>          by(tail(local).sum()).
.....20>        unfold()

==>jerryd=0.768170426065163
==>wyd=0.23182957393483708
==>lisa=0.6666666666666666
==>thomas=0.6491228070175439

第17行的最后group步骤使用path结果来计算找到的每个唯一名称的总偏好。要查看路径，可以在删除group步骤的情况下运行查询。

gremlin> g.withSack(1).V().
......1>    has('name',.....16>        sack()).fold())

==>[jane,0.3333333333333333]
==>[jane,0.6666666666666666]
==>[jane,50,wyd,0.11904761904761904]
==>[jane,jerryd,0.2142857142857143]
==>[jane,0.3157894736842105]
==>[jane,0.3508771929824561]
==>[jane,0.11278195488721804]
==>[jane,0.2030075187969925]

此answer非常优雅，最适合与Neptune和Python有关的环境。如果其他人遇到这个问题，我提供第二个参考。从看到这个问题的那一刻起，我只能将它想象为以GraphComputer的OLAP方式VertexProgram来解决。结果，我很难以任何其他方式考虑它。当然，使用VertexProgram需要Java之类的JVM语言，并且不能直接与Neptune一起使用。我想我最接近的解决方法是使用Java，从Neptune获取一个subgraph()，然后在TinkerGraph中本地运行自定义的VertexProgram，这会非常快。

更普遍的是，在没有Python / Neptune要求的情况下，根据图形的性质和需要遍历的数据量，将算法转换为VertexProgram并不是一个不错的方法。由于没有太多关于此主题的内容，我想在此提供它的代码核心。这是它的胆量：

        @Override
        public void execute(final Vertex vertex,final Messenger<Double> messenger,final Memory memory) {
            // on the first pass calculate the "total favor" for all vertices
            // and pass the calculated current favor forward along incident edges
            // only for the "start vertex" 
            if (memory.isInitialIteration()) {
                copyHaltedTraversersFromMemory(vertex);

                final boolean startVertex = vertex.value("name").equals(nameOfStartVertrex);
                final double initialFavor = startVertex ? 1d : 0d;
                vertex.property(VertexProperty.Cardinality.single,FAVOR,initialFavor);
                vertex.property(VertexProperty.Cardinality.single,TOTAL_FAVOR,IteratorUtils.stream(vertex.edges(Direction.OUT)).mapToDouble(e -> e.value("weight")).sum());

                if (startVertex) {
                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT,!incidents.hasNext());
                    while (incidents.hasNext()) {
                        final Edge incident = incidents.next();
                        messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),(double) incident.value("weight") /  (double) vertex.value(TOTAL_FAVOR));
                    }
                }
            } else {
                // on future passes,sum all the incoming "favor" and add it to
                // the "favor" property of each vertex. then once again pass the
                // current favor to incident edges. this will keep happening 
                // until the message passing stops.
                final Iterator<Double> messages = messenger.receiveMessages();
                final boolean hasMessages = messages.hasNext();
                if (hasMessages) {
                    double adjacentFavor = IteratorUtils.reduce(messages,0.0d,Double::sum);
                    vertex.property(VertexProperty.Cardinality.single,(double) vertex.value(FAVOR) + adjacentFavor);

                    final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
                    memory.add(VOTE_TO_HALT,adjacentFavor * ((double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR)));
                    }
                }
            }
        }

然后将以上代码执行为：

ComputerResult result = graph.compute().program(FavorVertexProgram.build().name("jane").create()).submit().get();
GraphTraversalSource rg = result.graph().traversal();
Traversal elements = rg.V().elementMap();

以及“元素”遍历的结果：

{id=0,label=person,^favor=1.0,name=jane,^totalFavor=30.0}
{id=2,^favor=0.6491228070175439,name=thomas,^totalFavor=140.0}
{id=4,^favor=0.6666666666666666,name=lisa,^totalFavor=190.0}
{id=6,^favor=0.23182957393483708,name=wyd,^totalFavor=0.0}
{id=8,^favor=0.768170426065163,name=jerryd,^totalFavor=0.0}

amazon-neptune gremlin gremlinpython tinkerpop3