使用 spaCy 3 的自定义 NER 训练抛出 ValueError

问题描述

我正在尝试使用 spacy 3 添加自定义 NER 标签。我找到了旧版本的教程并对 spacy 3 进行了调整。这是我正在使用的整个代码

import random
import spacy
from spacy.training import Example

LABEL = 'ANIMAL'
TRAIN_DATA = [
    ("Horses are too tall and they pretend to care about your feelings",{'entities': [(0,6,LABEL)]}),("Do they bite?",{'entities': []}),("horses are too tall and they pretend to care about your feelings",("horses pretend to care about your feelings",("they pretend to care about your feelings,those horses",{'entities': [(48,54,("horses?",LABEL)]})
]
nlp = spacy.load('en_core_web_sm')  # load existing spaCy model
ner = nlp.get_pipe('ner')
ner.add_label(LABEL)
print(ner.move_names) # Here I see,that the new label was added
optimizer = nlp.create_optimizer()
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != "ner"]
with nlp.disable_pipes(*other_pipes):  # only train NER
    for itn in range(20):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text,annotations in TRAIN_DATA:
            doc = nlp(text)
            example = Example.from_dict(doc,annotations)
            nlp.update([example],drop=0.35,sgd=optimizer,losses=losses)
        print(losses)
# test the trained model # add some dummy sentences with many NERs

test_text = 'Do you like horses?'
doc = nlp(test_text)
print("Entities in '%s'" % test_text)
for ent in doc.ents:
    print(ent.label_," -- ",ent.text)

代码输出 ValueError 异常,但仅在 2 次迭代之后 - 注意前两行:

{'ner': 9.862242701536594}
{'ner': 8.169456698315201}
Traceback (most recent call last):
  File ".\custom_ner_training.py",line 46,in <module>
    nlp.update([example],losses=losses)
  File "C:\ogr\moje\python\spacy_pg\myvenv\lib\site-packages\spacy\language.py",line 1106,in update
    proc.update(examples,sgd=None,losses=losses,**component_cfg[name])
  File "spacy\pipeline\transition_parser.pyx",line 366,in spacy.pipeline.transition_parser.Parser.update
  File "spacy\pipeline\transition_parser.pyx",line 478,in spacy.pipeline.transition_parser.Parser.get_batch_loss
  File "spacy\pipeline\_parser_internals\ner.pyx",line 310,in spacy.pipeline._parser_internals.ner.BiluoPushDown.set_costs
ValueError

我看到 ANIMAL 标签是通过调用 ner.move_names 添加的。

当我更改值 LABEL = 'PERSON 时,代码成功运行并在新数据上将马识别为 PERSON。这就是为什么我假设代码本身没有错误

有什么我遗漏的吗?我究竟做错了什么?请问有人可以复制吗?

注意:这是我在这里的第一个问题。我希望我提供了所有信息。如果没有,请在评论中告诉我。

解决方法

您需要更改 for 循环中的以下行

doc = nlp(text)

doc = nlp.make_doc(text)

代码应该可以工作并产生以下结果:

{'ner': 9.60289144264557}
{'ner': 8.875474230820478}
{'ner': 6.370401408220459}
{'ner': 6.687456469517201}
... 
{'ner': 1.3796682589133492e-05}
{'ner': 1.7709562613218738e-05}

Entities in 'Do you like horses?'
ANIMAL  --  horses

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...