问题描述
我有这样的JSON实体,我将其插入到图中作为边和顶点,如您所见,每个实体都已经具有高度的关系格式。
person = {
"summary": "Unix System Administrator at National Bank of Canada","id": "P6ZiIHhJ-PhON9W6UgeFwfA","name": "Patrick","type": "Person","employments": [
{
"isCurrent": True,"employer": {
"Name": "Commercial bank located in Canada","type": "Corporation"
},"title": "Unix System Administrator"
}
],"skills": [
{
"name": "string"
}
],"locations": [
{
"country": {
"name": "Canada","type": "AdministrativeArea"
}
}
],"someVertex": {
"k": "v"
}
}
我的问题是,将来,我可能会为同一个人收到一个新的json,如果发生更改并确保删除不再存在的子顶点,则需要在图中“更新”它。有点像upsert,但是在所有子节点和边缘上。
现在,我将根ID作为属性添加到每个子元素上,以便我可以全部找到它们并将其删除。还有另一种方法吗?
我的实际流程:
def add_vertex(g,label,dct,entity_id):
vertex = g.addV(label).property('entity_id',entity_id)
add_properties(g,vertex,entity_id)
return vertex
def add_properties(g,entity_id):
# Add properties
for k,v in dct.items():
if type(v) in [str,bool,int,float]:
vertex = vertex.property(k,v)
elif v and isinstance(v,list) and type(v[0]) in [str,float]:
for literal in v:
vertex = vertex.property(Cardinality.set_,k,literal)
vertex = vertex.next()
# Add child vertexes and edges to them
for k,v in dct.items():
if isinstance(v,dict):
nested_vertex = add_vertex(g,v,entity_id)
add_edge(g,nested_vertex,entity_id)
elif v and isinstance(v,list) and isinstance(v[0],dict):
for nested_v in v:
nested_vertex = add_vertex(g,nested_v,entity_id)
add_edge(g,entity_id)
def add_edge(g,name,from_v,to_v,entity_id):
g.addE(name).property('entity_id',entity_id).from_(from_v).to(to_v).iterate()
add_vertex(g,'Person',person,person['id'])
- 如果我接收到具有相同ID的人,想象一下dict中现在已经消失了顶点“ someVertex”,我该如何“增补”最初来自此人的整个顶点和边缘树,以便该顶点已移除?现在,我删除上一步中添加的具有“ entity_id”属性的所有元素。
if g.V().has(entity_type,'id',entity_id).hasNext():
g.V().has('entity_id',entity_id).drop().iterate()
add_vertex(g,entity_type,entity,entity_id)
解决方法
向所有顶点添加“ entity_id”属性并不是找到所有要删除的顶点的可怕方法。更加面向图的方式是简单地跟随父对象的边缘,递归地删除找到的所有顶点:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0],standard]
gremlin> g.addV().property(id,'A').as('a').
......1> addV().property(id,'B').as('b').
......2> addV().property(id,'C').as('c').
......3> addV().property(id,'E').as('e').
......4> addV().property(id,'F').as('f').
......5> addE('hasParent').from('a').to('b').
......6> addE('hasParent').from('b').to('c').
......7> addE('hasParent').from('c').to('e').
......8> addE('hasParent').from('e').to('f').iterate()
gremlin> g.V().has(id,'B').
......1> emit().
......2> repeat(out()).
......3> aggregate('x').
......4> select('x').unfold().
......5> drop()
gremlin> g.V().elementMap()
==>[id:A,label:vertex]
我选择先aggregate()
,因为大多数图形都倾向于使用我认为的方法(而不是遍历时掉线),但是您也可以尝试这样做以避免收集副作用{{1} }的“ x”。
有一些方法可以用更真实的upsert样式语义来更新图结构。不过,您具有相当健壮的树结构,因此我认为这将使Gremlin变得相当矮胖和复杂。在您的情况下,删除所有内容并将其重新添加可能是最有意义的-很难说。您可能会在StackOverflow和other places上的许多地方描述了这种高升模式。