问题描述
我是 spaCy 和 Python 的新手,我想使用这个库来可视化一个 NER。这是我找到的示例示例:
import spacy
from spacy import displacy
NER = spacy.load("en_core_web_sm")
raw_text="The Indian Space Research Organisation or is the national space agency of India,headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."
text1= NER(raw_text)
displacy.render(text1,style="ent",jupyter=True)
[812,834,"POS"],[838,853,"ORG"],[870,888,[892,920,[925,929,"ENGLEVEL"],[987,1002,"SKILL"],...
我希望使用我自己的自定义标签和实体来可视化我的文本,而不是 spaCy 的默认 NER 选项。我怎样才能做到这一点?
解决方法
您需要添加表示实体的字符跨度并将它们附加到您的 doc 对象。像这样:
import spacy
from spacy import displacy
nlp = spacy.blank('en')
raw_text = "The Indian Space Research Organisation or is the national space agency of India,headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."
doc = nlp.make_doc(raw_text)
spans = [[812,834,"POS"],[838,853,"ORG"],[870,888,[892,920,[925,929,"ENGLEVEL"],[987,1002,"SKILL"]]
ents = []
for span_start,span_end,label in spans:
ent = doc.char_span(span_start,label=label)
if ent is None:
continue
ents.append(ent)
doc.ents = ents
displacy.render(doc,style="ent",jupyter=True)
相应地更改您的 raw_text
和 spans
。如果您给出的跨度开始或结束超过文本长度,doc.char_span()
将返回 None
,因此您需要适当处理。