默认变压器管道中使用哪些功能?

问题描述

我正在寻找here at the feature extraction pipeline

我使用以下代码进行初始化:

from transformers import pipeline 
pipe = pipeline("feature-extraction") 
features = pipe("test")

我有很多功能认情况下使用什么模型?如何初始化此管道以使用特定的预训练模型?

len(features)
1
>>> features
[[[0.4122459590435028,0.10175584256649017,0.09342928230762482,-0.3119196593761444,-0.3226662278175354,-0.16414110362529755,0.06356583535671234,-0.03167172893881798,-0.010002809576690197,-1.1153486967086792,-0.3304346203804016,0.1727224737405777,-0.0904250368475914,-0.04243310168385506,-0.4745883047580719,0.09118127077817917,0.4240476191043854,0.2237153798341751,0.12108077108860016,-0.16883963346481323,0.055300742387771606,-0.07225772738456726,0.4521999955177307,-0.31655701994895935,0.05917530879378319,-0.0343029648065567,0.4157347083091736,0.10791877657175064,-0
...etc

文档告诉我:

所有模型均可用于此管道。请在huggingface.co/models上查看所有模型的列表,包括社区贡献的模型。

我不清楚此链接中的哪里可以初始化模型。该API非常简洁。

解决方法

很遗憾,正如您正确地指出的那样,pipelines文档非常稀疏。 但是,源代码指定默认使用哪些模型,请参见here。具体来说,该模型为distilbert-base-cased

有关使用模型的方法,请参见here我的相关答案。您可以像这样简单地指定modeltokenizer参数:

from transformers import pipeline

# Question answering pipeline,specifying the checkpoint identifier
pipeline('feature-extraction',model='bert-base-cased',tokenizer='bert-base-cased')