将保存的NER加载回HuggingFace管道吗?

问题描述

我正在研究HuggingFace的转移学习功能(特别是用于命名实体识别的功能)。首先,我对变压器体系结构有些陌生。我在他们的网站上简要介绍了他们的示例:

from transformers import pipeline

nlp = pipeline("ner")

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,therefore very" \
       "close to the Manhattan Bridge which is visible from the window."

print(nlp(sequence))

我想做的是保存并在本地运行,而不必每次都下载“ ner”模型(大小超过1 GB)。在他们的文档中,我看到您可以使用“ pipeline.save_pretrained()”函数将管道保存到本地文件夹。结果是将各种文件存储到特定的文件夹中。

我的问题是保存后如何将该模型重新加载到脚本中以继续进行分类? “ pipeline.save_pretrained()”的输出是多个文件

这是我到目前为止尝试过的:

1:遵循有关管道的文档

pipe = transformers.TokenClassificationPipeline(model="pytorch_model.bin",tokenizer='tokenizer_config.json')

我得到的错误是:'str'对象没有属性“ config

2:在ner上遵循HuggingFace示例:

from transformers import AutoModelForTokenClassification,AutoTokenizer
import torch

model = AutoModelForTokenClassification.from_pretrained("path to folder following .save_pretrained()")
tokenizer = AutoTokenizer.from_pretrained("path to folder following .save_pretrained()")

label_list = [
"O",# Outside of a named entity
"B-MISC",# Beginning of a miscellaneous entity right after another miscellaneous entity
"I-MISC",# Miscellaneous entity
"B-PER",# Beginning of a person's name right after another person's name
"I-PER",# Person's name
"B-ORG",# Beginning of an organisation right after another organisation
"I-ORG",# Organisation
"B-LOC",# Beginning of a location right after another location
"I-LOC"    # Location
]

sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,therefore very" \
       "close to the Manhattan Bridge."

# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence,return_tensors="pt")

outputs = model(inputs)[0]
predictions = torch.argmax(outputs,dim=2)

print([(token,label_list[prediction]) for token,prediction in zip(tokens,predictions[0].tolist())])

这会产生错误:列表索引超出范围

我还尝试仅打印出不返回令牌及其实体的文本格式的预测。

任何帮助将不胜感激!

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...