使用 SpaCy Displacy 可视化定制的 NER 标签

问题描述

我是 spaCy 和 Python 的新手,我想使用这个库来可视化一个 NER。这是我找到的示例示例:

import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

raw_text="The Indian Space Research Organisation or is the national space agency of India,headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."

text1= NER(raw_text)

displacy.render(text1,style="ent",jupyter=True)

The Example of Visualization

但是,我已经有了一个自定义标签及其位置的列表:

 [812,834,"POS"],[838,853,"ORG"],[870,888,[892,920,[925,929,"ENGLEVEL"],[987,1002,"SKILL"],...

我希望使用我自己的自定义标签和实体来可视化我的文本,而不是 spaCy 的认 NER 选项。我怎样才能做到这一点?

解决方法

您需要添加表示实体的字符跨度并将它们附加到您的 doc 对象。像这样:

import spacy
from spacy import displacy

nlp = spacy.blank('en')
raw_text = "The Indian Space Research Organisation or is the national space agency of India,headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."
doc = nlp.make_doc(raw_text)
spans = [[812,834,"POS"],[838,853,"ORG"],[870,888,[892,920,[925,929,"ENGLEVEL"],[987,1002,"SKILL"]]
ents = []
for span_start,span_end,label in spans:
    ent = doc.char_span(span_start,label=label)
    if ent is None:
        continue

    ents.append(ent)

doc.ents = ents
displacy.render(doc,style="ent",jupyter=True)

相应地更改您的 raw_textspans。如果您给出的跨度开始或结束超过文本长度,doc.char_span() 将返回 None,因此您需要适当处理。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...