将字典列表转换为rdf格式

问题描述

目标:(自动化:当词典列表很大时,我想生成特定格式的数据) 这是输入:
a = [{'EntityType2': 'features','Entity2': 'Role-Based Policy','Relation': 'hasFeatures','EntityType1': 'switch','Entity1': 'X450-G2'},{'EntityType2': 'Location','Entity2': 'WallJack','Relation': 'hasLocation',{'EntityType1': 'switch','Entity1': 'ers 3600','EntityType2': 'features','Entity2': 'ExtremeXOS'},{'EntityType1' : 'router','Entity1': 'slx 9540','Entity2': 'ExtremeXOS'
    },{
    'EntityType1': 'router','EntityType2': 'Location','Entity2': 'Chasis'}]

预期输出是这样:

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
 


:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".

:slx9540 a : router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".

这就是我尝试过的

slx = 0
X450 = 0
ers3600 = 0
for i in a:
    entity_type1 = i["EntityType1"]
    entity1 = i["Entity1"]
    entity_type2 = i["EntityType2"]
    entity2 = i["Entity2"]
    relation = i["Relation"]
    if 'switch' in entity_type1 or entity_type2:
        if entity1 == 'X450-G2' and X450 <= 0 :
            
            X450 +=1
            sd_line1 = ""
            sd_line2 = ""
            sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
            relation = ":"+relation
            sd_line2 ="\n"  ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
            sd_line3 = sd_line1 + sd_line2
            print(sd_line3)
        
            
        if entity1 == 'ers 3600' and ers3600<=0:
            ers3600 +=1
            print("\n\n")
            sd_line1 = ""
            sd_line2 = ""
            sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
            relation = ":"+relation
            sd_line2 ="\n"  ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
            sd_line3 = sd_line1 + sd_line2
            print(sd_line3)
        else:
            sd_line2 = ""
            relation = ":" + relation
            sd_line2 =":" + entity1 + " " + relation + " \"" + entity2 + "\"."
            sd_line3 =  sd_line2
            print(sd_line3)

我是python的新手,我无法获得预期的结果,并且有解决该问题的pythonish方法吗?? 这是目标。如果从输入中切换了EntiTytype1,我们应该得到以下输出

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy"

并且如果第二记录再次作为EntityType1和Entity1作为 switch 记录,即 X450-G2 ,则应将其附加到创建的第一个rdf三元组中。 / p>

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy"
:X450-G2 :hasLocation "WallJack".

,如果新记录具有 switch 作为Entitytype1,并且有一个新的Entity,例如: ers3600 ,则应在新行上创建该记录。 像这样

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".



:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".

,如果新记录具有作为 router 的新EntityType2 rd:slx 9540 那么应该在新行上创建记录,依此类推...

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".



:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".

:slx9540 a : router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".

解决方法

一些建议:在进行这样的转换工作流程时,请尝试将主要步骤分开,例如: loading 从系统中, parsing 一种格式的数据,提取,转换序列化为另一种格式,加载至另一系统。

在您的代码示例中,您将混合提取,转换和序列化步骤。分开这些步骤将使您的代码更易于阅读,从而更易于维护或重用。

下面,我为您提供两种解决方案:第一种是将数据提取到基于dict的简单subject-predicate-object图中,第二种是将数据提取到真实的RDF图中。

在两种情况下,您都会看到我将提取/转换步骤(返回一个图)和序列化步骤(使用了该图)分开,使它们更可重用:

  • 基于dict的转换是通过简单的dictdefaultdict实现的。序列化步骤对双方来说都是通用的。

  • 基于rdflib.Graph的转换对于两种序列化是通用的:一种对您的格式,另一种对任何可用的rdflib.Graph序列化。


这将从您的dict字典中构建一个基于a的简单图形:

graph = {}

for e in a:
    subj = e["Entity1"]
    graph[subj] = {}

    # :Entity1 a :EntityType1.
    obj = e["EntityType1"]
    graph[subj]["a"] = obj  

    # :Entity1 :Relation "Entity2".    
    pred,obj = e["Relation"],e["Entity2"]
    graph[subj][pred] = obj  

print(graph)

像这样:

{'X450-G2': {'a': 'switch','hasFeatures': 'Role-Based Policy','hasLocation': 'WallJack'},'ers 3600': {'a': 'switch','hasFeatures': 'ExtremeXOS'},'slx 9540': {'a': 'router','hasFeatures': 'ExtremeXOS','hasLocation': 'Chasis'}})

或者,以较短的形式,加上defaultdict

from collections import defaultdict

graph = defaultdict(dict)

for e in a:
    subj = e["Entity1"]
    
    # :Entity1 a :EntityType1.
    graph[subj]["a"] = e["EntityType1"]  

    # :Entity1 :Relation "Entity2".    
    graph[subj][e["Relation"]] = e["Entity2"]  

print(graph)

这将从图表中显示subject predicate object.三元组:

def normalize(text):
    return text.replace(' ','')

for subj,po in graph.items():
    subj = normalize(subj)

    # :Entity1 a :EntityType1.
    print(':{} a :{}.'.format(subj,po.pop("a")))

    for pred,obj in po.items():
        # :Entity1 :Relation "Entity2".    
        print(':{} :{} "{}".'.format(subj,pred,obj))

    print()

像这样:

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".

:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".

:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".

这将使用rdflib库构建一个真实的RDF图:

from rdflib import Graph,Literal,URIRef
from rdflib.namespace import RDF

A = RDF.type
graph = Graph()

for d in a:
   subj = URIRef(normalize(d["Entity1"]))

    # :Entity1 a :EntityType1.
    graph.add((
        subj,A,URIRef(normalize(d["EntityType1"]))
    ))
    
    # :Entity1 :Relation "Entity2".    
    graph.add((
        subj,URIRef(normalize(d["Relation"])),Literal(d["Entity2"])
    ))

此:

print(graph.serialize(format="n3").decode("utf-8"))

将以N3序列化格式打印图形:

<X450-G2> a <switch> ;
    <hasFeatures> "Role-Based Policy" ;
    <hasLocation> "WallJack" .

<ers3600> a <switch> ;
    <hasFeatures> "ExtremeXOS" .

<slx9540> a <router> ;
    <hasFeatures> "ExtremeXOS" ;
    <hasLocation> "Chasis" .

这将查询图形以您的格式打印它:

for subj in set(graph.subjects()):
    po = dict(graph.predicate_objects(subj))
    
    # :Entity1 a :EntityType1.
    print(":{} a :{}.".format(subj,po.pop(A)))
    
    for pred,obj))
    print()