Spacy.io Wikipedia实体链接器-结果NLP模型没有KB实体

问题描述

我一直在学习如何通过Wikipedia example here使用Sapcy.io实体链接器。

我从一小篇的2000篇文章的培训开始(运行了20个小时),但是结果模型甚至无法从培训中使用的文本中识别或返回任何kb实体。

nlp_kb.from_disk("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp") 

text = "Anarchism is a political philosophy and movement that rejects all involuntary,coercive forms of hierarchy. It calls for the abolition of the state which it holds to be undesirable,unnecessary and harmful. It is usually described alongside libertarian Marxism as the libertarian wing (libertarian socialism) of the socialist movement and as having a historical association with anti-capitalism and socialism. The history of anarchism goes back to prehistory,when some humans lived in anarchistic societies long before the establishment of formal states,realms or empires. With the rise of organised hierarchical bodies,skepticism toward authority also rose,but it was not until the 19th century that a self-conscious political movement emerged. During the latter half of the 19th and the first decades of the 20th century,the anarchist movement flourished in most parts of the world and had a significant role in workers' struggles for emancipation. Various anarchist schools of thought formed during this period. Anarchists have taken part in several revolutions,most notably in the Spanish Civil War,whose end marked the end of the classical era of anarchism. In the last decades of the 20th century and into the 21st century,the anarchist movement has been resurgent once more. Anarchism employs various tactics in order to meet its ideal ends; these can be broadly separated into revolutionary and evolutionary tactics."


doc = nlp_kb(text)
for ent in doc.ents:
    print(ent.text,ent.label_,ent.kb_id_)

结果

the 19th century DATE 
the latter half of the 19th and the first decades of the 20th century DATE 
Anarchists NORP 
the Spanish Civil War EVENT 
the last decades of the 20th century DATE 
the 21st century DATE

NLP模型没有实体链接程序管道。

nlp_kb.meta["pipeline"]
['tagger','parser','ner']

但是meta.json拥有它。

{
  "lang":"en","name":"core_web_lg","license":"MIT","author":"Explosion","url":"https://explosion.ai","email":"[email protected]","description":"English multi-task CNN trained on OntoNotes,with GloVe vectors trained on Common Crawl. Assigns word vectors,POS tags,dependency parses and named entities.","sources":[
    {
      "name":"OntoNotes 5","url":"https://catalog.ldc.upenn.edu/LDC2013T19","license":"commercial (licensed by Explosion)"
    },{
      "name":"GloVe Common Crawl","author":"Jeffrey Pennington,Richard Socher,and Christopher D. Manning","url":"https://nlp.stanford.edu/projects/glove/","license":"Public Domain Dedication and License v1.0"
    }
  ],"pipeline":[
    "tagger","parser","ner","entity_linker"
  ],

这是NLP目录的常量

(spacy) ➜  nlp git:(master) ✗ ls
entity_linker meta.json     ner           parser        tagger        tokenizer     vocab

(spacy) ➜  nlp git:(master) ✗ ls -l entity_linker
total 55040
-rw-r--r--  1 staff       323 Sep  8 04:40 cfg
-rw-r--r--  1 staff  25294844 Sep  8 04:40 kb
-rw-r--r--  1 staff   2875799 Sep  8 04:40 model

我假设我加载的模型错误,但是我不确定如何修复它。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)