有没有办法从磁盘离线使用 spacy-transformers

问题描述

我想在互联网访问受限的公司环境中使用 spacy-transformers,所以我必须手动从 Huggingfaces 中心下载变压器模型并让它们在 spacy 中工作。

在此示例中,我尝试使用 en_core_web_trf 预训练模型中的转换器管道组件:

import spacy
import spacy_transformers

nlp_trf = spacy.load("en_core_web_trf") # load roberta pretrained model
transformer= nlp_trf.get_pipe("transformer") # get transformer pipeline component
transformer.to_disk("transfomer_pretrained") # save pipeline component to disk

nlp = spacy.blank("en") 
trf = nlp.add_pipe("transformer")
trf.from_disk("transformer_pretrained",exclude=["vocab"]) # load transformer pipeline component from disk

我收到以下错误消息:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-c66c45181d83> in <module>
      1 #trf.model.initialize([nlp.make_doc("hello world")])
----> 2 trf.from_disk("models/transformer_pretrained",exclude=["vocab"])
      3 nlp.pipe_names

C:\_Development\python37\site-packages\spacy_transformers\pipeline_component.py in from_disk(self,path,exclude)
    400             "model": load_model,401         }
--> 402         util.from_disk(path,deserialize,exclude)
    403         return self

C:\_Development\python37\site-packages\spacy\util.py in from_disk(path,readers,exclude)
   1172         # Split to support file names like Meta.json
   1173         if key.split(".")[0] not in exclude:
-> 1174             reader(path / key)
   1175     return path
   1176 

C:\_Development\python37\site-packages\spacy_transformers\pipeline_component.py in load_model(p)
    390             p = Path(p).absolute()
    391             tokenizer,transformer = huggingface_from_pretrained(
--> 392                 p,self.model.attrs["tokenizer_config"]
    393             )
    394             self.model.attrs["tokenizer"] = tokenizer

C:\_Development\python37\site-packages\spacy_transformers\util.py in huggingface_from_pretrained(source,config)
     29     else:
     30         str_path = source
---> 31     tokenizer = AutoTokenizer.from_pretrained(str_path,**config)
     32     transformer = AutoModel.from_pretrained(str_path)
     33     ops = get_current_ops()

C:\_Development\python37\site-packages\transformers\models\auto\tokenization_auto.py in from_pretrained(cls,pretrained_model_name_or_path,*inputs,**kwargs)
    388         kwargs["_from_auto"] = True
    389         if not isinstance(config,PretrainedConfig):
--> 390             config = AutoConfig.from_pretrained(pretrained_model_name_or_path,**kwargs)
    391 
    392         use_fast = kwargs.pop("use_fast",True)

C:\_Development\python37\site-packages\transformers\models\auto\configuration_auto.py in from_pretrained(cls,**kwargs)
    396         """
    397         kwargs["_from_auto"] = True
--> 398         config_dict,_ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path,**kwargs)
    399         if "model_type" in config_dict:
    400             config_class = CONfig_MAPPING[config_dict["model_type"]]

C:\_Development\python37\site-packages\transformers\configuration_utils.py in get_config_dict(cls,**kwargs)
    464                 local_files_only=local_files_only,465                 use_auth_token=use_auth_token,--> 466                 user_agent=user_agent,467             )
    468             # Load config dict

C:\_Development\python37\site-packages\transformers\file_utils.py in cached_path(url_or_filename,cache_dir,force_download,proxies,resume_download,user_agent,extract_compressed_file,force_extract,use_auth_token,local_files_only)
   1171             user_agent=user_agent,1172             use_auth_token=use_auth_token,-> 1173             local_files_only=local_files_only,1174         )
   1175     elif os.path.exists(url_or_filename):

C:\_Development\python37\site-packages\transformers\file_utils.py in get_from_cache(url,etag_timeout,local_files_only)
   1387                 else:
   1388                     raise ValueError(
-> 1389                         "Connection error,and we cannot find the requested files in the cached path."
   1390                         " Please try again or make sure your Internet connection is on."
   1391                     )

ValueError: Connection error,and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

错误消息所述,在缓存路径中找不到请求的文件。有人可以向我解释我必须将哪些文件放在 chache 路径中吗?或者另一种预下载模型并在 spacy 中使用它们的方法

版本:

空间 3.0.5

空间变形金刚 1.0.2

变形金刚 4.5.1

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...