问题描述
想要使用 AllenNLP 和 coref-spanbert-large 模型在没有 Internet 的情况下解决共引用。 我尝试按照此处描述的方式进行操作https://demo.allennlp.org/coreference-resolution
我的代码:
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21,1953,in Seattle,Washington,to Kenneth Sam Allen and edna Faye Allen.Allen attended Lakeside School,a private school in Seattle,where he befriended Bill Gates,two years younger,with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)
当我可以访问互联网时,代码可以正常工作。 但是当我无法访问互联网时,我会收到以下错误:
Traceback (most recent call last):
File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py",line 14,in <module>
predictor = Predictor.from_path(r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz")
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\predictors\predictor.py",line 361,in from_path
load_archive(archive_path,cuda_device=cuda_device,overrides=overrides),File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py",line 206,in load_archive
config.duplicate(),serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\models\archival.py",line 232,in _load_dataset_readers
dataset_reader_params,serialization_dir=serialization_dir
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py",line 604,in from_params
**extras,File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py",line 632,in from_params
kwargs = create_kwargs(constructor_to_inspect,cls,params,**extras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py",line 200,in create_kwargs
cls.__name__,param_name,annotation,param.default,**extras
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py",line 307,in pop_and_construct_arg
return construct_arg(class_name,name,popped_params,default,line 391,in construct_arg
**extras,line 341,in construct_arg
return annotation.from_params(params=popped_params,**subextras)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\from_params.py",line 634,in from_params
return constructor_to_call(**kwargs) # type: ignore
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_mismatched_indexer.py",line 63,in __init__
**kwargs,File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\token_indexers\pretrained_transformer_indexer.py",line 58,in __init__
model_name,tokenizer_kwargs=tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\data\tokenizers\pretrained_transformer_tokenizer.py",line 71,add_special_tokens=False,**tokenizer_kwargs
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\allennlp\common\cached_transformers.py",line 110,in get_tokenizer
**kwargs,File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\tokenization_auto.py",line 362,in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path,**kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\models\auto\configuration_auto.py",line 368,in from_pretrained
config_dict,_ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path,**kwargs)
File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\configuration_utils.py",line 424,in get_config_dict
use_auth_token=use_auth_token,File "C:\Users\aap\Desktop\CoreNLP\corenlp\lib\site-packages\transformers\file_utils.py",line 1087,in cached_path
local_files_only=local_files_only,line 1268,in get_from_cache
"Connection error,and we cannot find the requested files in the cached path."
ValueError: Connection error,and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Process finished with exit code 1
请说,我需要什么才能在没有互联网的情况下运行我的代码?
解决方法
您将需要转换器模型的配置文件和词汇表的本地副本,以便标记器和标记索引器不需要下载这些:
from transformers import AutoConfig,AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")
然后您需要将配置文件中的转换器模型名称覆盖到您保存这些内容的本地目录 (local_config_path
):
predictor = Predictor.from_path(
r"C:\Users\aap\Desktop\coref-spanbert-large-2021.03.10.tar.gz",overrides={
"dataset_reader.token_indexers.tokens.model_name": local_config_path,"validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,"model.text_field_embedder.tokens.model_name": local_config_path,},)