问题描述
我希望使用此处提到的 bert-english-uncased-finetuned-pos
转换器
我以这种方式查询变压器...
from transformers import AutoTokenizer,AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
text = "My name is Clara and I live in Berkeley,California."
input_ids = tokenizer.encode(text + '</s>',return_tensors='pt')
outputs = model(input_ids)
但是 outputs
是这样的
(张量([[[-1.8196e+00,-1.9783e+00,-1.7416e+00,1.2082e+00,-7.0337e-02,
-7.0322e-03、3.4300e-01、-9.6914e-01、-1.3546e+00、7.7266e-03、
3.7128e+00、-3.4061e-01、4.8385e+00、-1.2548e+00、-5.1845e-01、
7.0140e-01,1.0394e+00],
[-1.2702e+00,-1.5518e+00,-1.1553e+00,-4.4077e-01,-9.8661e-01,-3.2680e-01,-6.5338e-01,-3.9779e-01,-7.5383e-01,-1.2677e+00,9.6353e+00、1.9938e-01、-1.0282e+00、-7.5071e-01、-1.0307e+00、
-8.0589e-01,4.2073e-01],
[-9.6988e-01,-5.0090e-01,-1.3858e+00,-1.0554e+00,-1.4040e+00,-7.5977e-01、-7.4156e-01、8.0594e+00、-5.1854e-01、-1.9098e+00、
-1.6362e-02、1.0594e+00、-8.4962e-01、-1.7415e+00、-1.0628e+00、
-1.7485e-01,-1.1490e+00],
[-1.4368e+00,-1.6313e-01,-1.3202e+00,8.7465e+00,-1.3782e+00,-9.8889e-01,-1.1371e+00,-1.0917e+00,-9.8495e-01,-9.3237e-01,-9.6111e-01、-4.1658e-01、-7.3133e-01、-9.6004e-01、-9.5337e-01、
3.1836e+00,-8.3462e-01],
[-7.9476e-01,-7.9640e-01,-9.0027e-01,-6.9506e-01,-8.9706e-01,-6.9383e-01、-3.1590e-01、1.2390e+00、-1.0443e+00、-9.9977e-01、
-8.8189e-01、8.7941e+00、-9.9445e-01、-1.2076e+00、-1.1424e+00、
-9.7801e-01,5.6683e-01],
[-8.2837e-01,-5.5060e-01,-2.1352e-01,-8.8721e-01,9.5536e+00,1.0478e+00、-5.6208e-01、-7.1037e-01、-7.0248e-01、1.1298e-01
...
-7.3788e-01、4.3640e-03、1.6994e+00、1.1528e-01、-1.0983e+00、
-8.9202e-01、-1.2869e+00、4.9141e+00、-6.2096e-01、4.8374e+00、
3.2384e-01,4.6213e-01],
[-1.3622e+00,2.0772e+00,-1.6680e+00,-8.8679e-01,-8.6959e-01,-1.7468e+00,-1.1424e+00,1.6996e+00,3.5800e-01,-4.3927e-01,-3.6129e-01、-4.2220e-01、-1.7912e+00、8.0154e-01、7.4594e-01、
-1.0620e+00,3.8152e+00],
[-1.2889e+00,-2.9379e-01,-1.6543e+00,-4.3326e-01,-2.4919e-01,-4.0112e-01、-4.4255e-01、2.2697e-01、-4.6042e-01、-3.7862e-03、
-6.3061e-01、-1.3280e+00、8.5533e+00、-4.6881e-01、2.3882e+00、
2.4533e-01,-1.4095e-01],
[-9.5640e-01,-5.7213e-01,-1.0245e+00,-5.3566e-01,-1.5287e-01,-6.6977e-01、-5.3392e-01、-3.1967e-02、-7.3077e-01、-3.1048e-01、
-7.2973e-01、-3.1701e-01、1.0196e+01、-5.2346e-01、4.0820e-01、
-2.1350e-01,1.0340e+00]]],grad_fn=),)
但根据文档,我希望输出为 JSON 格式...
[ { "entity_group": "PRON",“分数”:0.9994694590568542, “词”:“我的” },{ "entity_group": "NOUN",“分数”:0.997125506401062, “字”:“姓名” },{ "entity_group": "AUX",“分数”:0.9938186407089233, “词”:“是” },{ "entity_group": "PROPN",“分数”:0.9983252882957458, “词”:“克拉拉”},{ "entity_group": "CCONJ",“分数”:0.9991229772567749, “字”:“和”},{ "entity_group": "PRON",“分数”:0.9994894862174988, “字”:“我”},{ "entity_group": "动词",“分数”:0.9983153939247131, "word": "live" },{ "entity_group": "ADP",“分数”:0.999370276927948, “字”:“在”},{ "entity_group": "PROPN",“分数”:0.9987357258796692, “词”:“伯克利” },{ "entity_group": "PUNCT",“分数”:0.9996636509895325, “单词”: ”,” }, { "entity_group": "PROPN",“分数”:0.9985638856887817, “词”:“加利福尼亚” },“分数”:0.9996631145477295, “单词”: ”。” }]
解决方法
你看到的是来自 Huggingface 的专有推理 API。这个 API 不是转换器库的一部分,但您可以构建类似的东西。您只需要Tokenclassificationpipeline:
from transformers import AutoTokenizer,AutoModelForTokenClassification,TokenClassificationPipeline
tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
p = TokenClassificationPipeline(model=model,tokenizer=tokenizer)
p('My name is Clara and I live in Berkeley,California.')
输出:
[{'word': 'my','score': 0.9994694590568542,'entity': 'PRON','index': 1},{'word': 'name','score': 0.9971255660057068,'entity': 'NOUN','index': 2},{'word': 'is','score': 0.9938186407089233,'entity': 'AUX','index': 3},{'word': 'clara','score': 0.9983252882957458,'entity': 'PROPN','index': 4},{'word': 'and','score': 0.9991229772567749,'entity': 'CCONJ','index': 5},{'word': 'i','score': 0.9994894862174988,'index': 6},{'word': 'live','score': 0.9983154535293579,'entity': 'VERB','index': 7},{'word': 'in','score': 0.999370276927948,'entity': 'ADP','index': 8},{'word': 'berkeley','score': 0.9987357258796692,'index': 9},{'word': ',','score': 0.9996636509895325,'entity': 'PUNCT','index': 10},{'word': 'california','score': 0.9985638856887817,'index': 11},{'word': '.','score': 0.9996631145477295,'index': 12}]
您可以找到推理 API here 可能使用的其他可用管道。