解析 Hugging Face Transformer 输出

问题描述

我希望使用此处提到的 bert-english-uncased-finetuned-pos 转换器

https://huggingface.co/vblagoje/bert-english-uncased-finetuned-pos?text=My+name+is+Clara+and+I+live+in+Berkeley%2C+California。

我以这种方式查询变压器...

from transformers import AutoTokenizer,AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")

model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")

text = "My name is Clara and I live in Berkeley,California."
input_ids = tokenizer.encode(text + '</s>',return_tensors='pt')
outputs = model(input_ids)

但是 outputs 是这样的

(张量([[[-1.8196e+00,-1.9783e+00,-1.7416e+00,1.2082e+00,-7.0337e-02， -7.0322e-03、3.4300e-01、-9.6914e-01、-1.3546e+00、7.7266e-03、 3.7128e+00、-3.4061e-01、4.8385e+00、-1.2548e+00、-5.1845e-01、 7.0140e-01,1.0394e+00],
[-1.2702e+00,-1.5518e+00,-1.1553e+00,-4.4077e-01,-9.8661e-01,-3.2680e-01,-6.5338e-01,-3.9779e-01,-7.5383e-01,-1.2677e+00,9.6353e+00、1.9938e-01、-1.0282e+00、-7.5071e-01、-1.0307e+00、 -8.0589e-01,4.2073e-01],
[-9.6988e-01,-5.0090e-01,-1.3858e+00,-1.0554e+00,-1.4040e+00,-7.5977e-01、-7.4156e-01、8.0594e+00、-5.1854e-01、-1.9098e+00、 -1.6362e-02、1.0594e+00、-8.4962e-01、-1.7415e+00、-1.0628e+00、 -1.7485e-01,-1.1490e+00],
[-1.4368e+00,-1.6313e-01,-1.3202e+00,8.7465e+00,-1.3782e+00,-9.8889e-01,-1.1371e+00,-1.0917e+00,-9.8495e-01,-9.3237e-01,-9.6111e-01、-4.1658e-01、-7.3133e-01、-9.6004e-01、-9.5337e-01、 3.1836e+00,-8.3462e-01],
[-7.9476e-01,-7.9640e-01,-9.0027e-01,-6.9506e-01,-8.9706e-01,-6.9383e-01、-3.1590e-01、1.2390e+00、-1.0443e+00、-9.9977e-01、 -8.8189e-01、8.7941e+00、-9.9445e-01、-1.2076e+00、-1.1424e+00、 -9.7801e-01,5.6683e-01],
[-8.2837e-01,-5.5060e-01,-2.1352e-01,-8.8721e-01,9.5536e+00,1.0478e+00、-5.6208e-01、-7.1037e-01、-7.0248e-01、1.1298e-01

...

-7.3788e-01、4.3640e-03、1.6994e+00、1.1528e-01、-1.0983e+00、 -8.9202e-01、-1.2869e+00、4.9141e+00、-6.2096e-01、4.8374e+00、 3.2384e-01,4.6213e-01],
[-1.3622e+00,2.0772e+00,-1.6680e+00,-8.8679e-01,-8.6959e-01,-1.7468e+00,-1.1424e+00,1.6996e+00,3.5800e-01,-4.3927e-01,-3.6129e-01、-4.2220e-01、-1.7912e+00、8.0154e-01、7.4594e-01、 -1.0620e+00,3.8152e+00],
[-1.2889e+00,-2.9379e-01,-1.6543e+00,-4.3326e-01,-2.4919e-01,-4.0112e-01、-4.4255e-01、2.2697e-01、-4.6042e-01、-3.7862e-03、 -6.3061e-01、-1.3280e+00、8.5533e+00、-4.6881e-01、2.3882e+00、 2.4533e-01,-1.4095e-01],
[-9.5640e-01,-5.7213e-01,-1.0245e+00,-5.3566e-01,-1.5287e-01,-6.6977e-01、-5.3392e-01、-3.1967e-02、-7.3077e-01、-3.1048e-01、 -7.2973e-01、-3.1701e-01、1.0196e+01、-5.2346e-01、4.0820e-01、 -2.1350e-01,1.0340e+00]]],grad_fn=),)

但根据文档，我希望输出为 JSON 格式...

[ { "entity_group": "PRON",“分数”：0.9994694590568542， “词”：“我的” },{ "entity_group": "NOUN",“分数”：0.997125506401062， “字”：“姓名” },{ "entity_group": "AUX",“分数”：0.9938186407089233， “词”：“是” },{ "entity_group": "PROPN",“分数”：0.9983252882957458， “词”：“克拉拉”}，{ "entity_group": "CCONJ",“分数”：0.9991229772567749， “字”：“和”}，{ "entity_group": "PRON",“分数”：0.9994894862174988， “字”：“我”}，{ "entity_group": "动词",“分数”：0.9983153939247131， "word": "live" },{ "entity_group": "ADP",“分数”：0.999370276927948， “字”：“在”}，{ "entity_group": "PROPN",“分数”：0.9987357258796692， “词”：“伯克利” },{ "entity_group": "PUNCT",“分数”：0.9996636509895325， “单词”： ”，” }， { "entity_group": "PROPN",“分数”：0.9985638856887817， “词”：“加利福尼亚” },“分数”：0.9996631145477295， “单词”： ”。” }]

我做错了什么？如何将当前输出解析为所需的 JSON 输出？

解决方法

你看到的是来自 Huggingface 的专有推理 API。这个 API 不是转换器库的一部分，但您可以构建类似的东西。您只需要Tokenclassificationpipeline：

from transformers import AutoTokenizer,AutoModelForTokenClassification,TokenClassificationPipeline

tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")

model = AutoModelForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
p = TokenClassificationPipeline(model=model,tokenizer=tokenizer)
p('My name is Clara and I live in Berkeley,California.')

输出：

[{'word': 'my','score': 0.9994694590568542,'entity': 'PRON','index': 1},{'word': 'name','score': 0.9971255660057068,'entity': 'NOUN','index': 2},{'word': 'is','score': 0.9938186407089233,'entity': 'AUX','index': 3},{'word': 'clara','score': 0.9983252882957458,'entity': 'PROPN','index': 4},{'word': 'and','score': 0.9991229772567749,'entity': 'CCONJ','index': 5},{'word': 'i','score': 0.9994894862174988,'index': 6},{'word': 'live','score': 0.9983154535293579,'entity': 'VERB','index': 7},{'word': 'in','score': 0.999370276927948,'entity': 'ADP','index': 8},{'word': 'berkeley','score': 0.9987357258796692,'index': 9},{'word': ',','score': 0.9996636509895325,'entity': 'PUNCT','index': 10},{'word': 'california','score': 0.9985638856887817,'index': 11},{'word': '.','score': 0.9996631145477295,'index': 12}]

您可以找到推理 API here 可能使用的其他可用管道。

huggingface-tokenizers huggingface-transformers