问题描述
a = [{'EntityType2': 'features','Entity2': 'Role-Based Policy','Relation': 'hasFeatures','EntityType1': 'switch','Entity1': 'X450-G2'},{'EntityType2': 'Location','Entity2': 'WallJack','Relation': 'hasLocation',{'EntityType1': 'switch','Entity1': 'ers 3600','EntityType2': 'features','Entity2': 'ExtremeXOS'},{'EntityType1' : 'router','Entity1': 'slx 9540','Entity2': 'ExtremeXOS'
},{
'EntityType1': 'router','EntityType2': 'Location','Entity2': 'Chasis'}]
预期输出是这样:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a : router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
这就是我尝试过的
slx = 0
X450 = 0
ers3600 = 0
for i in a:
entity_type1 = i["EntityType1"]
entity1 = i["Entity1"]
entity_type2 = i["EntityType2"]
entity2 = i["Entity2"]
relation = i["Relation"]
if 'switch' in entity_type1 or entity_type2:
if entity1 == 'X450-G2' and X450 <= 0 :
X450 +=1
sd_line1 = ""
sd_line2 = ""
sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
relation = ":"+relation
sd_line2 ="\n" ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
sd_line3 = sd_line1 + sd_line2
print(sd_line3)
if entity1 == 'ers 3600' and ers3600<=0:
ers3600 +=1
print("\n\n")
sd_line1 = ""
sd_line2 = ""
sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
relation = ":"+relation
sd_line2 ="\n" ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
sd_line3 = sd_line1 + sd_line2
print(sd_line3)
else:
sd_line2 = ""
relation = ":" + relation
sd_line2 =":" + entity1 + " " + relation + " \"" + entity2 + "\"."
sd_line3 = sd_line2
print(sd_line3)
我是python的新手,我无法获得预期的结果,并且有解决该问题的pythonish方法吗?? 这是目标。如果从输入中切换了EntiTytype1,我们应该得到以下输出:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy"
并且如果第二记录再次作为EntityType1和Entity1作为 switch 记录,即 X450-G2 ,则应将其附加到创建的第一个rdf三元组中。 / p>
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy"
:X450-G2 :hasLocation "WallJack".
,如果新记录具有 switch 作为Entitytype1,并且有一个新的Entity,例如: ers3600 ,则应在新行上创建该记录。 像这样
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
,如果新记录具有作为 router 的新EntityType2 rd:slx 9540 那么应该在新行上创建记录,依此类推...
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a : router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
解决方法
一些建议:在进行这样的转换工作流程时,请尝试将主要步骤分开,例如: loading 从系统中, parsing 一种格式的数据,提取,转换,序列化为另一种格式,加载至另一系统。
在您的代码示例中,您将混合提取,转换和序列化步骤。分开这些步骤将使您的代码更易于阅读,从而更易于维护或重用。
下面,我为您提供两种解决方案:第一种是将数据提取到基于dict
的简单subject-predicate-object
图中,第二种是将数据提取到真实的RDF图中。
在两种情况下,您都会看到我将提取/转换步骤(返回一个图)和序列化步骤(使用了该图)分开,使它们更可重用:
-
基于
dict
的转换是通过简单的dict
或defaultdict
实现的。序列化步骤对双方来说都是通用的。 -
基于
rdflib.Graph
的转换对于两种序列化是通用的:一种对您的格式,另一种对任何可用的rdflib.Graph
序列化。
这将从您的dict
字典中构建一个基于a
的简单图形:
graph = {}
for e in a:
subj = e["Entity1"]
graph[subj] = {}
# :Entity1 a :EntityType1.
obj = e["EntityType1"]
graph[subj]["a"] = obj
# :Entity1 :Relation "Entity2".
pred,obj = e["Relation"],e["Entity2"]
graph[subj][pred] = obj
print(graph)
像这样:
{'X450-G2': {'a': 'switch','hasFeatures': 'Role-Based Policy','hasLocation': 'WallJack'},'ers 3600': {'a': 'switch','hasFeatures': 'ExtremeXOS'},'slx 9540': {'a': 'router','hasFeatures': 'ExtremeXOS','hasLocation': 'Chasis'}})
或者,以较短的形式,加上defaultdict
:
from collections import defaultdict
graph = defaultdict(dict)
for e in a:
subj = e["Entity1"]
# :Entity1 a :EntityType1.
graph[subj]["a"] = e["EntityType1"]
# :Entity1 :Relation "Entity2".
graph[subj][e["Relation"]] = e["Entity2"]
print(graph)
这将从图表中显示subject predicate object.
三元组:
def normalize(text):
return text.replace(' ','')
for subj,po in graph.items():
subj = normalize(subj)
# :Entity1 a :EntityType1.
print(':{} a :{}.'.format(subj,po.pop("a")))
for pred,obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj,pred,obj))
print()
像这样:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
这将使用rdflib
库构建一个真实的RDF图:
from rdflib import Graph,Literal,URIRef
from rdflib.namespace import RDF
A = RDF.type
graph = Graph()
for d in a:
subj = URIRef(normalize(d["Entity1"]))
# :Entity1 a :EntityType1.
graph.add((
subj,A,URIRef(normalize(d["EntityType1"]))
))
# :Entity1 :Relation "Entity2".
graph.add((
subj,URIRef(normalize(d["Relation"])),Literal(d["Entity2"])
))
此:
print(graph.serialize(format="n3").decode("utf-8"))
将以N3
序列化格式打印图形:
<X450-G2> a <switch> ;
<hasFeatures> "Role-Based Policy" ;
<hasLocation> "WallJack" .
<ers3600> a <switch> ;
<hasFeatures> "ExtremeXOS" .
<slx9540> a <router> ;
<hasFeatures> "ExtremeXOS" ;
<hasLocation> "Chasis" .
这将查询图形以您的格式打印它:
for subj in set(graph.subjects()):
po = dict(graph.predicate_objects(subj))
# :Entity1 a :EntityType1.
print(":{} a :{}.".format(subj,po.pop(A)))
for pred,obj))
print()