Spacy BILOU格式转换为spacy json格式

问题描述

我正试图将我的spacy版本升级到每晚,尤其是对于使用spacy转换器的情况

所以我转换了格式像这样的伪造简单火车数据集

td = [["Who is Shaka Khan?",{"entities": [(7,17,"FRIENDS")]}],["I like London.",13,"LOC")]}],]

高于

[[{"head": 0,"dep": "","tag": "","orth": "Who","ner": "O","id": 0},{"head": 0,"orth": "is","id": 1},"orth": "Shaka","ner": "B-FRIENDS","id": 2},"orth": "Khan","ner": "L-FRIENDS","id": 3},"orth": "?","id": 4}],[{"head": 0,"orth": "I","orth": "like","orth": "London","ner": "U-LOC","orth": ".","id": 3}]]

使用以下脚本

sentences = []
for t in td:
    doc = nlp(t[0])
    tags = offsets_to_biluo_tags(doc,t[1]['entities'])
    ner_info = list(zip(doc,tags))
    tokens = []
    for n,i in enumerate(ner_info):
        token = {"head" : 0,"dep" : "","tag" : "","orth" : i[0].orth_,"ner" : i[1],"id" : n}
        tokens.append(token)
    sentences.append(tokens)



with open("train_data.json","w") as js:
    json.dump(sentences,js)```


then i tried to convert this train_data.json using 
spacy's convert command

```python -m spacy convert train_data.json converted/```


but the result in converted folder is

```✔ Generated output file (0 documents): converted/train_data.spacy``` 

which means it doesn't created dataset

can anybody help on what i am missing

i am trying to do this with spacy-nightly

解决方法

您可以跳过JSON中间步骤,并将注释直接转换为DocBin

import spacy
from spacy.training import Example
from spacy.tokens import DocBin

td = [["Who is Shaka Khan?",{"entities": [(7,17,"FRIENDS")]}],["I like London.",13,"LOC")]}],]

nlp = spacy.blank("en")
db = DocBin()

for text,annotations in td:
    example = Example.from_dict(nlp.make_doc(text),annotations)
    db.add(example.reference)

db.to_disk("td.spacy")

请参阅:https://nightly.spacy.io/usage/v3#migrating-training-python

(如果您确实想使用中间JSON格式,则请遵循以下规范:https://spacy.io/api/annotation#json-input。您只需在orth中包含nertokens并保留其他功能,但您需要使用paragraphsrawsentences的结构。此处是一个示例:https://github.com/explosion/spaCy/blob/45c9a688285081cd69faa0627d9bcaf1f5e799a1/examples/training/training-data.json

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...